lballabio / QuantLib

The QuantLib C++ library
http://quantlib.org
Other
5.26k stars 1.78k forks source link

Performance regression going from 1.31 to 1.34 #1960

Closed tomwhoiscontrary closed 3 months ago

tomwhoiscontrary commented 5 months ago

We have been running on 1.31 for a while. I am attempting to upgrade to 1.34. Once the necessary changes were made, i found that my application produced identical results, but was considerably slower - measured at a very coarse granularity (roughly "build a particular curve and evaluate a lot of metrics"), things took 2x - 3x longer.

I have picked one fairly simple subsystem and extracted it into a standalone program which just needs QuantLib to build. The program runs in a loop building an ESTR curve from OIS quotes, and then pricing swaps, timing how long it takes to price the swaps. It does twenty warmup iterations, then twenty measurement iterations, and prints the time taken in milliseconds for each of the latter (along with the minimum and maximum calculated swap rates, as a sanity check). This shows a roughly 4x - 5x slowdown in pricing swaps.

With 1.31 it prints:

iteration,minSwapRate,maxSwapRate,elapsed
1,0.015481,0.039528,10.503099
2,0.015481,0.039528,10.589348
3,0.015481,0.039528,10.340485
4,0.015481,0.039528,10.325391
5,0.015481,0.039528,10.381338
6,0.015481,0.039528,10.418790
7,0.015481,0.039528,10.305750
8,0.015481,0.039528,10.256093
9,0.015481,0.039528,10.858235
10,0.015481,0.039528,11.441151
11,0.015481,0.039528,11.270055
12,0.015481,0.039528,10.663258
13,0.015481,0.039528,10.538956
14,0.015481,0.039528,10.339440
15,0.015481,0.039528,10.377952
16,0.015481,0.039528,10.255195
17,0.015481,0.039528,10.317267
18,0.015481,0.039528,10.683488
19,0.015481,0.039528,10.702674
20,0.015481,0.039528,10.575539

With 1.34 it prints:

iteration,minSwapRate,maxSwapRate,elapsed
1,0.015481,0.039528,45.819755
2,0.015481,0.039528,47.309368
3,0.015481,0.039528,48.141875
4,0.015481,0.039528,47.666822
5,0.015481,0.039528,47.050929
6,0.015481,0.039528,47.007651
7,0.015481,0.039528,47.852757
8,0.015481,0.039528,47.241947
9,0.015481,0.039528,47.784260
10,0.015481,0.039528,47.959932
11,0.015481,0.039528,48.477983
12,0.015481,0.039528,48.494646
13,0.015481,0.039528,48.306312
14,0.015481,0.039528,47.953128
15,0.015481,0.039528,48.861151
16,0.015481,0.039528,48.116760
17,0.015481,0.039528,48.483408
18,0.015481,0.039528,47.896765
19,0.015481,0.039528,48.180332
20,0.015481,0.039528,47.863267

See DiscountingCurveDemo.cpp.txt for the code.

My versions of QuantLib carry some small patches affecting the calculation of swap BPS, but that should not be relevant here.

I compiled QuantLib with GCC 7.2.0, and Boost 1.66.0. My build script contains:

export CXXFLAGS="-O2 -ggdb -Wall -Wno-unknown-pragmas -Werror -std=c++14 -fno-math-errno -fno-trapping-math -DBOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS"

./configure --with-sysroot=${sysroot_dir} --enable-std-classes --enable-indexed-coupons --enable-error-lines

I compiled the app with GCC 13.1.0. My CMakeLists.txt includes:

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_BUILD_TYPE RelWithDebInfo)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -fno-math-errno -fno-trapping-math -ggdb")

The reason i'm compiling QuantLib with such an old GCC is that i've seen significant performance regressions using newer ones.

I'm on Ubuntu 22.04.4.

Do you have any thoughts on this? Can you reproduce this difference? If not, do you see any obvious differences between your setup and mine? I am happy to spend time changing things around at my end - but so far, the difference has been unavoidable, so i would like to have some high-Sharpe-ratio ideas on what to change!

pcaspers commented 5 months ago

1.32 introduced lazy cashflows, it might be related to that change - I'll try to reproduce using your test code

When you say newer compiler versions cause performance issues, do you mean longer build times or degraded performance during runtime?

tomwhoiscontrary commented 5 months ago

@pcaspers I mean degraded performance during runtime. It's been a while since i tried though. I am trying to set up some infrastructure to explore this in a reproducible and shareable way - hopefully i can update you on that at some point.

pcaspers commented 5 months ago

Ok interesting. Keep us updated on that topic.

sweemer commented 5 months ago

@tomwhoiscontrary The usual high Sharpe ratio way to find the source of slow code is to use a profiler like VTune Profiler. Maybe you can try that and let us know what you find?

lballabio commented 5 months ago

I couldn't reproduce it on my Mac—if anything, 1.34 was slightly faster. Compiled with the configure and cxx flags you reported, but of course it's clang, not gcc, and I have the latest boost installed.

lballabio commented 5 months ago

The same goes for an Ubuntu 22.04.4 machine, default gcc 11.4. No difference.

tomwhoiscontrary commented 4 months ago

Interesting. That's encouraging, because it means there might be a problem with my build environment, but frustrating, because it means there might be a problem with my build environment.

I'm trying to set up a simple self-contained build in a docker container, where i can vary the compiler, Boost version, and QuantLib version. This is proving a surprisingly rocky road so far, though. Will keep you posted.

lballabio commented 4 months ago

Thanks!

tomwhoiscontrary commented 4 months ago

I have written a script to run this demo in a docker container with a defined version of GCC and QuantLib: https://github.com/tomwhoiscontrary/QuantLibDemo

It gets Boost from the distro package manager, so that depends on the version of Debian used by the GCC image. It doesn't seem to make much difference, though. I would like to get that under vcpkg at some point.

The results from this are interesting - all combinations of GCC and QuantLib give a result of about 21 - 26 ms per iteration. It mostly gets faster with later GCC versions, and mostly stays the same across QuantLib versions. There are deviations from that pattern which might be meaningful and might be noise, but are fairly minor.

So the good (?) news is that this does not reproduce my core worry, that QL has got significantly slower. On the strength of that, i'm happy to close this bug. I'll keep working on this, and let you know if i find anything.

One thing that's different between this and my real codebase is that here, the same compiler is used for QuantLib and the demo code, whereas for in the real code, we build QuantLib with GCC 7 and the Demo with GCC 13. I'd be surprised if that combination was faster in this setting.

Something that's quite odd here is that the performance is >20 ms, whereas in my local build outside a container, using QuantLib 1.31, it's 10 ms.

lballabio commented 4 months ago

Well, I would say "good to hear" if it wasn't for your problem...

Thanks for the analysis, and do keep us informed!

tomwhoiscontrary commented 3 months ago

I resolved this. There was no problem with QuantLib at all. There was an incidental change to our build scripts around the same time as QuantLib 1.33 came out which disabled optimisations!

lballabio commented 3 months ago

Ok, this time I can say "good to hear" :)