Open nbecker opened 9 years ago
Can you precise your inputs size?
Here is test program:
import numpy as np
from limit import Limit
def clip (z, _max):
mask = np.abs(z) > _max
## print ('#clipped:', np.sum(mask))
z[mask] = (Limit (z) * _max)[mask]
return z
from timeit import timeit
u = np.ones (1000000, dtype=complex)
print 'numpy'
print timeit ('clip(u, 10)', 'from __main__ import u, clip', number=100)
from clip import clip as clip2
print 'pythran non-vector'
print timeit ('clip2(u, 10)', 'from __main__ import u, clip2', number=100)
print 'pythran vector'
from clip2 import clip as clip3
print timeit ('clip3(u, 10)', 'from __main__ import u, clip3', number=100)
print 'c++'
from limit import clip as clip4
print timeit ('clip4(u, 10)', 'from __main__ import u, clip4', number=100)
## print timeit ('clip(u[::2], 10)', 'from __main__ import u, clip', number=100)
## print timeit ('clip2(u[::2], 10)', 'from __main__ import u, clip2', number=100)
Also, can you change out[mask2] = x[mask2] / np.abs(x[mask2])
by out[mask2] = np.sign(x[mask2])
?
That's not the same thing when x is complex, is it?
On Wed, Jan 7, 2015 at 10:30 AM, pbrunet notifications@github.com wrote:
Also, can you change out[mask2] = x[mask2] / np.abs(x[mask2]) by out[mask2] = np.sign(x[mask2]) ?
— Reply to this email directly or view it on GitHub https://github.com/serge-sans-paille/pythran/issues/382#issuecomment-69037584 .
Those who don't understand recursion are doomed to repeat it
You are right. Sorry I didn't look at the typing information. For now, I don't know why we don't perform as good as the C++ version but thanks you for the feedback.
Did you use special compilation flags for the C++ version? Optimisation? OpenMP? autovectorization?
This is the compile command (the amcl part probably does nothing)
g++ -o limit.os -c -g -DBOOST_DISABLE_THREADS -O3 -march=native -ftree-vectorize -fstrict-aliasing -ffast-math -DNDEBUG -DBOOST_DISABLE_ASSERTS -std=c++1y -Wall -Wno-unused-local-typedefs -std=c++1y -fPIC -DHAVE_UNURAN=1 -DHAVE_CONSTRAINED=1 -DHAVE_TWISTER_SERIALIZATION=0 -I. -I/usr/local/src/ndarray/include -I/usr/include -I/usr/include/python2.7 -I/home/nbecker/.local/lib/python2.7/site-packages/numpy/core/include -I/usr/include/eigen3 -I/opt/intel/composerxe/ipp/include -I/opt/intel/composerxe/mkl/include -I/opt/acml5.3.0/gfortran64/include limit.cc
On Wed, Jan 7, 2015 at 10:47 AM, pbrunet notifications@github.com wrote:
You are right. Sorry I didn't look at the typing information. For now, I don't know why we don't perform as good as the C++ version but thanks you for the feedback.
Did you use special compilation flags for the C++ version? Optimisation? OpenMP? autovectorization?
— Reply to this email directly or view it on GitHub https://github.com/serge-sans-paille/pythran/issues/382#issuecomment-69040507 .
Those who don't understand recursion are doomed to repeat it
g++ --version g++ (GCC) 4.9.2 20141101 (Red Hat 4.9.2-1)
On Wed, Jan 7, 2015 at 10:47 AM, pbrunet notifications@github.com wrote:
You are right. Sorry I didn't look at the typing information. For now, I don't know why we don't perform as good as the C++ version but thanks you for the feedback.
Did you use special compilation flags for the C++ version? Optimisation? OpenMP? autovectorization?
— Reply to this email directly or view it on GitHub https://github.com/serge-sans-paille/pythran/issues/382#issuecomment-69040507 .
Those who don't understand recursion are doomed to repeat it
results persist when I compile pythran code using same options: pythran -v -O3 -march=native -ftree-vectorize -fstrict-aliasing -ffast-math clip.py pythran -v -O3 -march=native -ftree-vectorize -fstrict-aliasing -ffast-math clip2.py
python test_clip.py numpy 2.94298911095 pythran non-vector 0.988628864288 pythran vector 1.74669694901 c++ 0.410514116287
On Wed, Jan 7, 2015 at 10:52 AM, Neal Becker ndbecker2@gmail.com wrote:
g++ --version g++ (GCC) 4.9.2 20141101 (Red Hat 4.9.2-1)
On Wed, Jan 7, 2015 at 10:47 AM, pbrunet notifications@github.com wrote:
You are right. Sorry I didn't look at the typing information. For now, I don't know why we don't perform as good as the C++ version but thanks you for the feedback.
Did you use special compilation flags for the C++ version? Optimisation? OpenMP? autovectorization?
— Reply to this email directly or view it on GitHub https://github.com/serge-sans-paille/pythran/issues/382#issuecomment-69040507 .
Those who don't understand recursion are doomed to repeat it
Those who don't understand recursion are doomed to repeat it
Can you try removing fast-math and vectorize for GCC?
python test_clip.py numpy 2.87930011749 pythran non-vector 0.987450122833 pythran vector 1.73527002335 c++ 0.831676959991
But why did pythran not improve when I added these flags to it?
On Wed, Jan 7, 2015 at 11:43 AM, Mehdi Amini notifications@github.com wrote:
Can you try removing fast-math and vectorize for GCC?
— Reply to this email directly or view it on GitHub https://github.com/serge-sans-paille/pythran/issues/382#issuecomment-69050165 .
Those who don't understand recursion are doomed to repeat it
Did you check which of fast-math or vectorize brought the boost? (can be the two together only)
Pythran does not improve because the generated C++ code can't be vectorized by gcc (I haven't looked into details why). But at least it gives some informations on where to look.
seems -ffast-math is what makes it 2x faster (0.41s) -ftree-vectorize has little effect (0.83s)
On Wed, Jan 7, 2015 at 11:53 AM, Mehdi Amini notifications@github.com wrote:
Did you check which of fast-math or vectorize brought the boost? (can be the two together only)
Pythran does not improve because the generated C++ code can't be vectorized by gcc (I haven't looked into details why). But at least it gives some informations on where to look.
— Reply to this email directly or view it on GitHub https://github.com/serge-sans-paille/pythran/issues/382#issuecomment-69051791 .
Those who don't understand recursion are doomed to repeat it
Note that fast-math is wrong for any respectable numerical code anyway :)
On Wed, Jan 07, 2015 at 07:02:37AM -0800, ndbecker wrote:
This is a benchmark of a simple function (clip) written in 4 ways:
- numpy / c++ (my limit function is written in c++)
- pythran without vector
- pythran with vector (but using slicing)
- c++ [...]
Thanks a lot neal for your input benchmark! I had a quick look, it appears the masked expressions are not very efficient in pythran (but still better than numpy, as shown by your bench).
I rewrote your numpy expression as the following:
def limit (x, epsilon=1e-6):
return np.where( np.abs(x) < epsilon, 0, x/np.abs(x))
def clip0 (z, _max): mask = np.abs(z) > _max
z[mask] = (limit (z[mask]) * _max)
return z
Under numpy, it's not as clever as it could, as x/np.abs(x) is computed for all x values. Under Pythran however, we have the opportunity to lazyly evaluate the expression so that only the relevant part are computed. this is not done because of a not so good implementation of numpy.where, but we have an opportunity here.
@pbrunet: do you see what i mean? As you wrote np.where, do you want to have a look, or I should add it to my TODO list?
I exactly see what you mean :-) I didn't do the modification because np.where is a trinary_expr so it could be really more generic than just improving np.where. For example numpy.clip could be see as a trinary_expr too with 2 of its arguments which will be scalar. We can add to this list at least : around (binary), angle(binary) and isclose (Quatrary_expr?)
I may try to have a look at this (it is already in my todo since 2014-10-05) but for now, I am working on fast subscript detection so I will not have time for it right now.
For the record, once #686 is merged, numpy.where will run significantly faster, and the following implementation:
#pythran export clip0(complex128[], float64)
def clip0 (z, _max):
return np.where( np.abs(z) > _max, np.where( np.abs(z) < 1e-6, 0, z/np.abs(z)) * _max, z)
runs as as fast as the loop version.
This is a benchmark of a simple function (clip) written in 4 ways:
numpy / c++ (my limit function is written in c++)
pythran without vector
pythran with vector (but using slicing)
c++
2,
3.
4. c++ using ndarray library from ndarray.googlecode.com
timings are:
python test_clip.py numpy 2.94420599937 pythran non-vector 0.991260051727 pythran vector 1.71353602409 c++ 0.41252207756