Open serge-sans-paille opened 9 years ago
@serge-sans-paille i've copy-pasted your code an got:
10000 loops, best of 3: 103 usec per loop
when I compile the c++ code that we provide in the benchmarks and then measure the timing (using pdf = __import__("pdf", globals(), locals(), [], -1).run
) I get:
10000 loops, best of 3: 55.1 usec per loop
This factor of 2 is expected.
What is your OS and compiler?
OS: linux/debian/testing compiler: c++ --version g++-4.9.real (Debian 4.9.1-19) 4.9.1
Admittedly, I have little experience with this combination (HOPE on debian & g++4.9).
What are the timings you get for the C++ and the jitted PDF code?
What are the compile flags you’ve used to compile the C++ code and what is HOPE using (add import hope; hope.config.verbose = True;
in the call)
pdf(float32^2 density, int64^1 dims, float64^1 center, float32^2 w2D, int64 r50, float64 b, float64 a)
for x.l in (0.J:dims.l[0.J]) {
for y.l in (0.J:dims.l[1.J]) {
new dr.d
dr.d = numpy.sqrt((((x.l - center.d[0.J]) ** 2.J) + ((y.l - center.d[1.J]) ** 2.J)))
new __sum0.d
__sum0.d = numpy.sum(((((w2D.f[:w2D@0,:w2D@1] * 2.J) * (b.D - 1.J)) / ((2.J * 3.141592653589793.D) * ((r50.J * a.d) ** 2.J))) * ((1.J + ((dr.d / (r50.J * a.d)) ** 2.J)) ** -b.D)))
density.f[x.l, y.l] = __sum0.d
}
}
return density.f[:density@0,:density@1]
Compiling following functions:
pdf(float32^2 density, int64^1 dims, float64^1 center, float32^2 w2D, int64 r50, float64 b, float64 a)
running build_ext
building 'pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0' extension
C compiler: x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -fno-strict-aliasing -g -O2 -fPIC
compile options: '-I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c'
extra options: '-Wall -Wno-unused-variable -std=c++11'
x86_64-linux-gnu-gcc: /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.cpp
c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wl,-z,relro -g -O2 /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.o -o /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.so
10 loops, best of 3: 1.41 msec per loop
and 1.32ms when compiling with clang
@serge-sans-paille I was able to reproduce the behavior you see on an Ubuntu box. It seems like that the other benchmarks are doing alright and only the star-psf benchmark is causing some issues.
As expected, the code that HOPE generates is identical on OSX and Ubuntu. This makes me assume that the compilers on Linux might struggle to optimize the code as much as clang on OSX. This isn’t very satisfying but I don’t have better explanation at the moment.
I installed hope from the git and run the following:
with:
and the output is rather slow compared to the expected result. C++ module runs at the expected speed, so what did I do wrong?