Local_Moran.__crand profiling and optimizing

GoogleCodeExporter commented 9 years ago

I started profiling the __crand function in Local_Moran and found some 
interesting results.  Documenting those results here.

The profiling script is attached as lm_profile.py and modifies the doc test 
extending it to 9999 permutations.
which is run as,
python -m cProfile -s time lm_profile.py 

The base line results are,
282665 function calls (279057 primitive calls) in 4.409 seconds
Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    3.055    3.055    3.807    3.807 moran.py:555(__crand)
   100002    0.526    0.000    0.526    0.000 {sum}
     9999    0.263    0.000    0.263    0.000 {method 'permutation' of 'mtrand.RandomState' objects}
...........

What's noticeable here is that 3.055 seconds are spent internal to __crand not 
on other function calls.

To tease out where that time in being spent I added timers between the lines of 
the for loop,
581         for i in range(self.w.n):
582             idsi = ids[ids != i]
583             np.random.shuffle(idsi)
584             lisas[i] = [z[i] * sum(w[i] * z[idsi[rid[0:wc[i]]]]) for rid in 
rids]

and found that line 584 was consuming ~ 3.3 seconds (timing slows things a bit).

lisas[i] = [z[i] * sum(w[i] * z[idsi[rid[0:wc[i]]]]) for rid in rids]

If we break this line up into equivalent parts we can further determine where 
the time is going,

tmp = [z[idsi[rid[0:wc[i]]]] for rid in rids]
lisas[i] = [z[i] * sum(w[i] * t) for t in tmp]

Somewhat surprisingly the 1st line consumes only ~0.37 seconds
while the 2nd line consumes ~2.8 seconds

The first line produces a list of 9999 small arrays,
The next lines multiples each of these small arrays be w[i], takes the sum and 
multiples by z[i]
that's essentially 3 function calls * 9999 permutations * 10 variables in this 
example.
(Note: localizing z[i] and w[i] provides a very limited speed up ~0.04 seconds.)

However, these two lines can by rewritten as follows,
tmp = np.array([z[idsi[rid[0:wc[i]]]] for rid in rids])
lisas[i] = z[i] * (w[i] * tmp).sum(1)

This reduces the number of function calls from ~300,000 to 30 (over ten 
variables) and from ~2.8 seconds to ~0.007 seconds

The improved profiling results are,
182695 function calls (179087 primitive calls) in 1.429 seconds
Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.393    0.393    0.719    0.719 moran.py:555(__crand)
     9999    0.265    0.000    0.265    0.000 {method 'permutation' of 'mtrand.RandomState' objects}
        4    0.077    0.019    0.077    0.019 {__import__}
....

A ~68% speed up.

Results committed in [1074]

Original issue reported on code.google.com by schmi...@gmail.com on 16 Jan 2012 at 5:33

Attachments:

lm_profile.py

GoogleCodeExporter commented 9 years ago

Taking a second look at the list comprehension....
tmp = np.array([z[idsi[rid[0:wc[i]]]] for rid in rids])

which was taking ~0.37seconds.

This can be rewritten as,
tmp = z[idsi[rids[:,0:wc[i]]]]

Avoiding the for loop and the back and forth from numpy arrays to python lists 
to numpy arrays, and takes ~0.008 seconds

The improved profiling results are,
182685 function calls (179077 primitive calls) in 0.904 seconds
Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     9999    0.263    0.000    0.263    0.000 {method 'permutation' of 'mtrand.RandomState' objects}
       12    0.047    0.004    0.047    0.004 {sum}
        1    0.031    0.031    0.307    0.307 moran.py:555(__crand)

A ~79% time decrease over the original.

Committed in r1076

Original comment by schmi...@gmail.com on 16 Jan 2012 at 6:45

GoogleCodeExporter commented 9 years ago

Looking at the calls to numpy.random.permutation...

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     9999    0.263    0.000    0.263    0.000 {method 'permutation' of 'mtrand.RandomState' objects}

Lines...
573         k = self.w.max_neighbors + 1
574         nn = range(self.n - 1)
575         rids = np.array([np.random.permutation(nn)[0:k] for i in prange])

The documentation for np.random.permuation states,
"""
        If `x` is an integer, randomly permute ``np.arange(x)``.
        If `x` is an array, make a copy and shuffle the elements
"""
Since we are passing in nn which is a python list, numpy must first 
convert/copy it into a numpy array.
If we pass in a numpy array to begin with we get an immediate speed up since 
the datastructure doesn't need to be converted from python's to numpy's 

574         nn = np.arange(self.n - 1)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     9999    0.105    0.000    0.105    0.000 {method 'permutation' of 'mtrand.RandomState' objects}

We can get a very slight speed up from there by passing only the int inside of 
the array...

574         nn = self.n - 1

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     9999    0.098    0.000    0.098    0.000 {method 'permutation' of 'mtrand.RandomState' objects}

The improved profiling results are,
         182684 function calls (179076 primitive calls) in 0.741 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     9999    0.098    0.000    0.098    0.000 {method 'permutation' of 'mtrand.RandomState' objects}
       12    0.047    0.004    0.047    0.004 {sum}
        1    0.031    0.031    0.141    0.141 moran.py:555(__crand)
...

~83% time decrease over the original.
Committed in r1078

Original comment by schmi...@gmail.com on 16 Jan 2012 at 9:00

pombreda / pysal

Local_Moran.__crand profiling and optimizing #188