I am not sure how we should deal with this, don't think this is for the feat/interpolator branch to "solve", therefore the issue
RelWithDebInfo returns faster benchmark results than Release, I have no idea why
Might be an issue of code cache explosion: when using -O3 (which I believe is the default for Release), the inliner can go crazy, resulting in very large assembly/machine code. This negatively impact performance, since the CPU has to fetch and decode all this code, which do not fit in the L1 cache anymore.
What is the time difference between -O2 and -O3 here? Also, does this happens with GCC or clang?
GNU (g++ 7.4.0)
speedup is sometimes marginal 1.01 profile_matrix_cubic_spline, but can also be 1.2 for profile_scalar_cubic_spline. Indeed LLC Miss Count is 3030k (Release) vs 2220k (RelWithDebugInfo) for profile_matrix_cubic_spline.
For clang I actually get problems to run the benchmarks, added to my TODOs (yey)
I am not sure how we should deal with this, don't think this is for the
feat/interpolator
branch to "solve", therefore the issueMight be an issue of code cache explosion: when using -O3 (which I believe is the default for Release), the inliner can go crazy, resulting in very large assembly/machine code. This negatively impact performance, since the CPU has to fetch and decode all this code, which do not fit in the L1 cache anymore.
What is the time difference between -O2 and -O3 here? Also, does this happens with GCC or clang?
Originally posted by @Luthaf in https://github.com/cosmo-epfl/librascal/pull/113#issuecomment-533188183
GNU (g++ 7.4.0) speedup is sometimes marginal 1.01
profile_matrix_cubic_spline
, but can also be 1.2 forprofile_scalar_cubic_spline
. Indeed LLC Miss Count is 3030k (Release) vs 2220k (RelWithDebugInfo) forprofile_matrix_cubic_spline
.For clang I actually get problems to run the benchmarks, added to my TODOs (yey)
Originally posted by @agoscinski in https://github.com/cosmo-epfl/librascal/pull/113#issuecomment-533224198