Closed tobson closed 7 years ago
@tberlok Is this right? Have you recently compiled the code with Clang?
Just checked and reproduced this error with LLVM clang v. 3.8.1. There seems to be some information online about issues with inline functions and clang, see
I suspect Cython is probably at fault here and GCC just happens to be a little more lenient than Clang. Where Clang throws an error,
skeletor/cython/push_and_deposit.c:1928:8: error: 'inline' can only appear on
functions
static CYTHON_INLINE int (*__pyx_f_8skeletor_6cython_17particle_boundary...
GCC gives merely a warning,
skeletor/cython/push_and_deposit.c:1947:28: warning: variable '__pyx_f_8skeletor_6cython_17particle_boundary_calculate_ihole_cdef' declared 'inline'
static CYTHON_INLINE int (*__pyx_f_8skeletor_6cython_17particle_boundary_calculate_ihole_cdef)(struct __pyx_t_8skeletor_6cython_5types_particle_t, __Pyx_memviewslice, struct __pyx_obj_8skeletor_6cython_5types_grid_t *, int, int); /*proto*/
It compiles and the tests work if one removes the word 'inline' in the .pxd files (deposit.pxd, particle_boundary.pxd & particle_push.pxd). The question is then if push_and_deposit becomes much slower by doing so.
As in C++, we probably need to put the entire definition of the inline functions into pxd (i.e. header) files, not just their declaration, see http://cython.readthedocs.io/en/latest/src/tutorial/pxd_files.html
GCC was apparently right to give a warning: with the inline functions' body in pxd files, the code is almost twice as fast if my benchmarks on my iMac 5K are to be believed. The following are timings of tests/test_fastwave.py
with nperiods=10.0
and OMP_NUM_THREADS=1
.
Commit / Compiler
------------------------------------------------------------
0407070 (inline-pxd) / Clang 3.9.1
real 0m19.396s | user 0m19.324s | sys 0m0.046s
real 0m19.335s | user 0m19.266s | sys 0m0.044s
real 0m19.345s | user 0m19.278s | sys 0m0.043s
------------------------------------------------------------
0407070 (inline-pxd) / GCC 6.3.0
real 0m19.007s | user 0m18.929s | sys 0m0.050s
real 0m19.027s | user 0m18.960s | sys 0m0.044s
real 0m19.141s | user 0m19.069s | sys 0m0.047s
------------------------------------------------------------
5632aaa (master) / GCC 6.3.0
real 0m33.926s | user 0m33.844s | sys 0m0.054s
real 0m34.274s | user 0m34.166s | sys 0m0.067s
real 0m33.893s | user 0m33.806s | sys 0m0.056s
The error mentioned in the first point in #111 actually has nothing to with OpenMP. I just tried compiling the code with LLVM 3.9's Clang (which supports OpenMP) and I got the same error. I suspect the code right now simply doesn't compile with Clang.