avoid sum(X ** 2) or mean(X **2)

agramfort commented 11 years ago

sum(X \ 2) is equivalent to np.dot(X.ravel(), X.ravel()) which avoids a copy.

same trick works for mean(X \ 2)

would be nice to go over the full code base to spot these low hanging fruits.

dengemann commented 11 years ago

.... see #2406 in sklearn / FactorAnalysis code

dengemann commented 11 years ago

@agramfort I can take a stab on this. If you don't beat me to do it today ;-)

agramfort commented 11 years ago

no chance I'll do it before next week. Maybe you can leave it to someone more junior :)

dengemann commented 11 years ago

fine with me, I was not craving for this task btw. ;-)

agramfort commented 11 years ago

any volunteer?

@mainakjas @rgoj ?

rgoj commented 11 years ago

any volunteer?

@mainakjas @rgoj ?

haha, I was just waiting to be pinged from this thread, I knew that was coming ;)

I'd like to focus now on the time-frequency beamforming, but I could actually use this as a bit of a break next week (to keep contributing regularly, but get a break from the hardcore stuff), so yeah, I can do it around Tuesday or Wednesday next week, unless somebody gets round to it earlier. If nobody does by next week, I'll let you know here that I'm on it when I start. So, yes, I volunteer :)

agramfort commented 11 years ago

no rush

thanks for volunteering @rgoj

mluessi commented 11 years ago

Interesting. For n=1e6 it is about 5x faster on my laptop :).

larsoner commented 11 years ago

It's it the same speed for shorter arrays? At one point I considered using it but I thought I found it to be slower. Maybe I'm crazy though. On Sep 20, 2013 5:28 AM, "Martin Luessi" notifications@github.com wrote:

Interesting. For n=1e6 it is about 5x faster on my laptop :).

— Reply to this email directly or view it on GitHubhttps://github.com/mne-tools/mne-python/issues/773#issuecomment-24806761 .

mluessi commented 11 years ago

For me it's also faster for short arrays. E.g, for n=10 it is 1.7us vs 9us.

dengemann commented 11 years ago

It's it the same speed for shorter arrays? At one point I considered using it but I thought I found it to be slower. Maybe I'm crazy though.

I've tested it when refactoring the sklearn FactorAnalsysis code,

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/factor_analysis.py#L199

it's ridiculously fast.

mluessi commented 11 years ago

Hmmm.. the downside is if X is F_CONTIGUOUS, it actually isn't faster, because ravel() will reorder the array. Also, using X.flat is about 5x slower than when using X.ravel() for me. To make it faster all the time, we would have to use

 np.dot(X.ravel(order='F' if np.isfortran(X) else 'C'), X.ravel(order='F' if np.isfortran(X) else 'C'))

mluessi commented 11 years ago

of course this could be written in two lines ;)

dengemann commented 11 years ago

or put it in a compute_norm function or so, in utils

On Sep 20, 2013, at 5:24 PM, Martin Luessi notifications@github.com wrote:

of course this could be written in two lines ;)

— Reply to this email directly or view it on GitHub.

mluessi commented 11 years ago

To clarify:

In [50]: a = np.random.randn(100, 100)
In [51]: X = a
In [52]: %timeit np.sum(X ** 2)
10000 loops, best of 3: 27.7 us per loop
In [53]: %timeit np.dot(X.flat, X.flat)
100000 loops, best of 3: 13.3 us per loop
In [54]: %timeit np.dot(X.ravel(), X.ravel())
100000 loops, best of 3: 5.6 us per loop
In [55]: %timeit np.dot(X.ravel(order='F' if np.isfortran(X) else 'C'), X.ravel(order='F' if np.isfortran(X) else 'C'))
100000 loops, best of 3: 7.65 us per loop
In [56]: X = a.T   # fortran order
In [57]: %timeit np.sum(X ** 2)
10000 loops, best of 3: 27.8 us per loop
In [58]: %timeit np.dot(X.flat, X.flat)
10000 loops, best of 3: 92.7 us per loop
In [59]: %timeit np.dot(X.ravel(), X.ravel())
10000 loops, best of 3: 26.4 us per loop
In [60]: %timeit np.dot(X.ravel(order='F' if np.isfortran(X) else 'C'), X.ravel(order='F' if np.isfortran(X) else 'C'))
100000 loops, best of 3: 7.74 us per loop

dengemann commented 11 years ago

Sounds like the last variant should be put in a compute euclidean scalar norm function then .... this we could use everywhere in one line.

On Sep 20, 2013, at 5:35 PM, Martin Luessi notifications@github.com wrote:

To clarify:

In [50]: a = np.random.randn(100, 100) In [51]: X = a In [52]: %timeit np.sum(X * 2) 10000 loops, best of 3: 27.7 us per loop In [53]: %timeit np.dot(X.flat, X.flat) 100000 loops, best of 3: 13.3 us per loop In [54]: %timeit np.dot(X.ravel(), X.ravel()) 100000 loops, best of 3: 5.6 us per loop In [55]: %timeit np.dot(X.ravel(order='F' if np.isfortran(X) else 'C'), X.ravel(order='F' if np.isfortran(X) else 'C')) 100000 loops, best of 3: 7.65 us per loop In [56]: X = a.T # fortran order In [57]: %timeit np.sum(X * 2) 10000 loops, best of 3: 27.8 us per loop In [58]: %timeit np.dot(X.flat, X.flat) 10000 loops, best of 3: 92.7 us per loop In [59]: %timeit np.dot(X.ravel(), X.ravel()) 10000 loops, best of 3: 26.4 us per loop In [60]: %timeit np.dot(X.ravel(order='F' if np.isfortran(X) else 'C'), X.ravel(order='F' if np.isfortran(X) else 'C')) 100000 loops, best of 3: 7.74 us per loop — Reply to this email directly or view it on GitHub.

larsoner commented 11 years ago

+1 for @dengemann's wrapping idea, that way if we discover other "quirks" we can compensate for them as well

agramfort commented 11 years ago

can you also bench against linalg.norm(a, 'fro') \ 2 ?

On Fri, Sep 20, 2013 at 5:35 PM, Martin Luessi notifications@github.comwrote:

To clarify:

In [50]: a = np.random.randn(100, 100) In [51]: X = a In [52]: %timeit np.sum(X * 2) 10000 loops, best of 3: 27.7 us per loop In [53]: %timeit np.dot(X.flat, X.flat) 100000 loops, best of 3: 13.3 us per loop In [54]: %timeit np.dot(X.ravel(), X.ravel()) 100000 loops, best of 3: 5.6 us per loop In [55]: %timeit np.dot(X.ravel(order='F' if np.isfortran(X) else 'C'), X.ravel(order='F' if np.isfortran(X) else 'C')) 100000 loops, best of 3: 7.65 us per loop In [56]: X = a.T # fortran order In [57]: %timeit np.sum(X * 2) 10000 loops, best of 3: 27.8 us per loop In [58]: %timeit np.dot(X.flat, X.flat) 10000 loops, best of 3: 92.7 us per loop In [59]: %timeit np.dot(X.ravel(), X.ravel()) 10000 loops, best of 3: 26.4 us per loop In [60]: %timeit np.dot(X.ravel(order='F' if np.isfortran(X) else 'C'), X.ravel(order='F' if np.isfortran(X) else 'C')) 100000 loops, best of 3: 7.74 us per loop

— Reply to this email directly or view it on GitHubhttps://github.com/mne-tools/mne-python/issues/773#issuecomment-24819091 .

mluessi commented 11 years ago

@agramfort it is terribly slow:

In [68]: X = a
In [69]: %timeit linalg.norm(X, 'fro') ** 2
10000 loops, best of 3: 81.5 us per loop
In [70]: X = a.T
In [71]: %timeit linalg.norm(X, 'fro') ** 2
10000 loops, best of 3: 93.1 us per loop

agramfort commented 11 years ago

Thanks for checking

mainakjas commented 11 years ago

this sounds great! Sure, I could have a stab. Even I'm not sure of my schedule, but let me know @rgoj if you start so that we're not working on the same stuff at the same time :)

dengemann commented 11 years ago

@agramfort suggested this as a non-senior task ;-)

On Sep 20, 2013, at 8:01 PM, mainakjas notifications@github.com wrote:

this sounds great! Sure, I could have a stab. Even I'm not sure of my schedule, but let me know @rgoj if you start so that we're not working on the same stuff at the same time :)

— Reply to this email directly or view it on GitHub.

mainakjas commented 11 years ago

ok, I'm on it. h-o-o-o-o-o-o-l-d on ...

mne-tools / mne-python

avoid sum(X 2) or mean(X 2) #773

mne-tools / mne-python

avoid sum(X ** 2) or mean(X **2) #773

avoid sum(X 2) or mean(X 2) #773