xcompact3d / x3d2

https://xcompact3d.github.io/x3d2
BSD 3-Clause "New" or "Revised" License
3 stars 4 forks source link

Check impact of -ffast-math on performance & correctness #82

Open JamieJQuinn opened 5 months ago

JamieJQuinn commented 5 months ago

Just noticed we're setting -ffast-math or -fast in the release build. For a summary of why this is dangerous yet helpful, see this blog post. I haven't personally found it affects the correctness of fluid simulations, but I also haven't found it to improve performance so I tend to stick to just -O3 just in case.

I think it's worth testing

  1. whether -ffast-math actually improves performance, and
  2. whether we're potentially affecting correctness of results
semi-h commented 5 months ago

Thanks for the issue, we actually had a discussion on this around 6 months ago and people were particularly worried about -ffast-math breaking the order of floating point arithmetic (associative math). In the original codebase (Incompact3D) the derivative functions are written very very carefully to specify a very clear ordering when operating on data points. Its written in a way that the finite difference operations always carried out as specified in the discretisations, like $u{i-3}$ is paired with $u{i+3}$ when taking a derivative on a velocity point, or somethimes $u{i-3}$ is paired with $u{i+2}$ for staggared operations.

Then @mathrack did some tests on the original codebase using the -ffast-math flag which is definitely destructive to these delicate associative operations alongside many other things -ffast-math triggers, and the results were identical as far as I remember for a test case where there is an analytical solution exists. I guess this was convincing for all of us (especially for going ahead with the single and unified tds solver functions we have in the new framework), but its a good idea to monitor this and report any differences it may cause on the quantities of interest and on performance as we move forward and support new cases in the codebase.

If I remember correctly the performance difference was around %2-3 for a part of the code, but we're still not running the entire solver on CPUs so don't know for certain how it'll be at the end. But considering -ffast-math can only make FP stuff better and we're bandwidth bound, its impact will likely be small. And when it comes to the affect of it on accuracy, I think its very unlikely that it'll have an impact as you mentioned. In CFD simulations we operate on pairs of data with similar order of magnitude, and if things go bad and we get to a point where -ffast-math can have an impact on accuracy, it probably won't make any difference and simulation would blow up regardless. The only case I can think of this might have an impact is an iLES simulation, but probably even for that it might not have any measurable impact.

And I also want to try some performance optimisations on CUDA kernels that I put on hold a few months ago to do more work on the codebase. I'll try and use similar flags on the GPU and measure their impact on performance, and I think this issue is a good place to report those as well.

pbartholomew08 commented 5 months ago

A very interesting article. If the observed performance difference is low, it might be better to favour the safer -O3 by default, and document how to change this. Based on the article though we might want to check what impact it has on vectorisation across TDMA solves/etc (although given the explicit use of OMP SIMD in the CPU code atleast this should, hopefully not be impacted).

mathrack commented 5 months ago

I remember some tests on 2D TGV with similar results at machine precision for low-level and high-level optimization using gfortran. However, I am afraid this kind of test can change depending on the machine, the compiler, the compiler version, ...

JamieJQuinn commented 4 months ago

So how do we feel about setting the default value to just -O3, so a new user compiles with no potential numerical issues (from fast-math at least...) but we document in the user guide & in the CMakeLists.txt how to change it with info on potential issues & performance gains.

OR we could keep fast-math on and print a warning when compiling & running telling the user to check their results or turn opts down to -O3.

:+1: for option 1, :-1: for option 2, :eyes: for a different suggestion.

semi-h commented 4 months ago

Having below in the CMake file

set(CMAKE_Fortran_FLAGS_FAST "-O3 -ffast-math")

and then using -DCMAKE_BUILD_TYPE=fast when building worked for me. Does anyone know if this is something CMake supports or I'm being luck somehow, or missing something? If this is okay it will allow people to go ahead with their preference regarding -ffast-math. My main concern is the vectorisation as @pbartholomew08 mentioned because it makes a meaningful difference, not just a few percent, and it can be tricky to get the compiler to generate the instructions especially for AVX512, but it'll be easy to spot. Hopefuly compiler will vectorise even without -ffast-math, and I'm pretty sure it did in my earlier tests. However, [1] mentiones the potential vectorisation issues for SSE and SSE2 in particular, and I'm sure it goes all the way up to AVX512 too. If it doesn't vectorise in some cases, we may consider adding a specific flag to enable vectorsiation in the release build, so that performance impact of not having -ffast-math will be minimal while not letting anyone getting scared. I'll probably go ahead with fast build most of the time and do occasional checks just out of curiosity, reporting anything interesting.

[1] https://gcc.gnu.org/wiki/FloatingPointMath