jakeret / hope

HOPE: A Python Just-In-Time compiler for astrophysical computations
GNU General Public License v3.0
382 stars 27 forks source link

Fail to reproduce timings #39

Open serge-sans-paille opened 9 years ago

serge-sans-paille commented 9 years ago

I installed hope from the git and run the following:

import numpy as np
import hope
@hope.jit
def pdf(density, dims, center, w2D, r50, b, a):
    for x in range(dims[0]):
        for y in range(dims[1]):
            dr = np.sqrt((x - center[0]) ** 2 + (y - center[1]) ** 2)
            density[x, y] = np.sum(w2D * 2 * (b - 1) / (2 * np.pi * (r50 * a)**2) * (1 + (dr / (r50 * a))**2)**(-b))
    return density

with:

 python -m timeit -s 'import numpy as np; b = 3.5; a = 1. / np.sqrt(2. ** (1. / (b - 1.)) - 1.) ; r50=20;center = np.array([10.141, 10.414]);dims = np.array([20, 20]) ; x1D = np.array([ 0.5 - 0.9491079123427585245262 / 2 , 0.5 - 0.7415311855993944398639 / 2 , 0.5 - 0.4058451513773971669066 / 2 , 0.5 , 0.5 + 0.4058451513773971669066 / 2 , 0.5 + 0.7415311855993944398639 / 2 , 0.5 + 0.9491079123427585245262 / 2 ], dtype=np.float32) ; w1D = np.array([ 0.1294849661688696932706 / 2 , 0.2797053914892766679015 / 2 , 0.38183005050511894495 / 2 , 0.4179591836734693877551 / 2 , 0.38183005050511894495 / 2 , 0.2797053914892766679015 / 2 , 0.1294849661688696932706 / 2 ], dtype=np.float32) ; w2D = np.outer(w1D, w1D) ; from pdf import pdf; density = np.zeros(dims, dtype=np.float32)' 'pdf(density, dims, center, w2D, r50, b, a)'

and the output is rather slow compared to the expected result. C++ module runs at the expected speed, so what did I do wrong?

cosmo-ethz commented 9 years ago

@serge-sans-paille i've copy-pasted your code an got: 10000 loops, best of 3: 103 usec per loop

when I compile the c++ code that we provide in the benchmarks and then measure the timing (using pdf = __import__("pdf", globals(), locals(), [], -1).run) I get: 10000 loops, best of 3: 55.1 usec per loop

This factor of 2 is expected.

What is your OS and compiler?

serge-sans-paille commented 9 years ago

OS: linux/debian/testing compiler: c++ --version g++-4.9.real (Debian 4.9.1-19) 4.9.1

cosmo-ethz commented 9 years ago

Admittedly, I have little experience with this combination (HOPE on debian & g++4.9).

What are the timings you get for the C++ and the jitted PDF code?

What are the compile flags you’ve used to compile the C++ code and what is HOPE using (add import hope; hope.config.verbose = True; in the call)

serge-sans-paille commented 9 years ago
pdf(float32^2 density, int64^1 dims, float64^1 center, float32^2 w2D, int64 r50, float64 b, float64 a)
    for x.l in (0.J:dims.l[0.J]) {
        for y.l in (0.J:dims.l[1.J]) {
            new dr.d
            dr.d = numpy.sqrt((((x.l - center.d[0.J]) ** 2.J) + ((y.l - center.d[1.J]) ** 2.J)))
            new __sum0.d
            __sum0.d = numpy.sum(((((w2D.f[:w2D@0,:w2D@1] * 2.J) * (b.D - 1.J)) / ((2.J * 3.141592653589793.D) * ((r50.J * a.d) ** 2.J))) * ((1.J + ((dr.d / (r50.J * a.d)) ** 2.J)) ** -b.D)))
            density.f[x.l, y.l] = __sum0.d
        }
    }
    return density.f[:density@0,:density@1]

Compiling following functions:
pdf(float32^2 density, int64^1 dims, float64^1 center, float32^2 w2D, int64 r50, float64 b, float64 a)
running build_ext
building 'pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0' extension
C compiler: x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -fno-strict-aliasing -g -O2 -fPIC

compile options: '-I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c'
extra options: '-Wall -Wno-unused-variable -std=c++11'
x86_64-linux-gnu-gcc: /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.cpp
c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wl,-z,relro -g -O2 /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.o -o /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.so

10 loops, best of 3: 1.41 msec per loop
serge-sans-paille commented 9 years ago

and 1.32ms when compiling with clang

cosmo-ethz commented 9 years ago

@serge-sans-paille I was able to reproduce the behavior you see on an Ubuntu box. It seems like that the other benchmarks are doing alright and only the star-psf benchmark is causing some issues.

As expected, the code that HOPE generates is identical on OSX and Ubuntu. This makes me assume that the compilers on Linux might struggle to optimize the code as much as clang on OSX. This isn’t very satisfying but I don’t have better explanation at the moment.