lab-cosmo / librascal

A scalable and versatile library to generate representations for atomic-scale learning
https://lab-cosmo.github.io/librascal/
GNU Lesser General Public License v2.1
80 stars 18 forks source link

RelWithDebInfo returns faster benchmark results than Release, I have no idea why #125

Open agoscinski opened 5 years ago

agoscinski commented 5 years ago

I am not sure how we should deal with this, don't think this is for the feat/interpolator branch to "solve", therefore the issue

RelWithDebInfo returns faster benchmark results than Release, I have no idea why

Might be an issue of code cache explosion: when using -O3 (which I believe is the default for Release), the inliner can go crazy, resulting in very large assembly/machine code. This negatively impact performance, since the CPU has to fetch and decode all this code, which do not fit in the L1 cache anymore.

What is the time difference between -O2 and -O3 here? Also, does this happens with GCC or clang?

Originally posted by @Luthaf in https://github.com/cosmo-epfl/librascal/pull/113#issuecomment-533188183

GNU (g++ 7.4.0) speedup is sometimes marginal 1.01 profile_matrix_cubic_spline, but can also be 1.2 for profile_scalar_cubic_spline. Indeed LLC Miss Count is 3030k (Release) vs 2220k (RelWithDebugInfo) for profile_matrix_cubic_spline.

For clang I actually get problems to run the benchmarks, added to my TODOs (yey)

Originally posted by @agoscinski in https://github.com/cosmo-epfl/librascal/pull/113#issuecomment-533224198

max-veit commented 5 years ago

https://stackoverflow.com/questions/19470873/why-does-gcc-generate-15-20-faster-code-if-i-optimize-for-size-instead-of-speed?rq=1

agoscinski commented 5 years ago

put this issue into the developer doc

Luthaf commented 5 years ago

Is their anything to do here, or can we close this issue ?