Open JamieJQuinn opened 8 months ago
Thanks for the issue, we actually had a discussion on this around 6 months ago and people were particularly worried about -ffast-math breaking the order of floating point arithmetic (associative math). In the original codebase (Incompact3D) the derivative functions are written very very carefully to specify a very clear ordering when operating on data points. Its written in a way that the finite difference operations always carried out as specified in the discretisations, like $u{i-3}$ is paired with $u{i+3}$ when taking a derivative on a velocity point, or somethimes $u{i-3}$ is paired with $u{i+2}$ for staggared operations.
Then @mathrack did some tests on the original codebase using the -ffast-math flag which is definitely destructive to these delicate associative operations alongside many other things -ffast-math triggers, and the results were identical as far as I remember for a test case where there is an analytical solution exists. I guess this was convincing for all of us (especially for going ahead with the single and unified tds solver functions we have in the new framework), but its a good idea to monitor this and report any differences it may cause on the quantities of interest and on performance as we move forward and support new cases in the codebase.
If I remember correctly the performance difference was around %2-3 for a part of the code, but we're still not running the entire solver on CPUs so don't know for certain how it'll be at the end. But considering -ffast-math can only make FP stuff better and we're bandwidth bound, its impact will likely be small. And when it comes to the affect of it on accuracy, I think its very unlikely that it'll have an impact as you mentioned. In CFD simulations we operate on pairs of data with similar order of magnitude, and if things go bad and we get to a point where -ffast-math can have an impact on accuracy, it probably won't make any difference and simulation would blow up regardless. The only case I can think of this might have an impact is an iLES simulation, but probably even for that it might not have any measurable impact.
And I also want to try some performance optimisations on CUDA kernels that I put on hold a few months ago to do more work on the codebase. I'll try and use similar flags on the GPU and measure their impact on performance, and I think this issue is a good place to report those as well.
A very interesting article. If the observed performance difference is low, it might be better to favour the safer -O3
by default, and document how to change this. Based on the article though we might want to check what impact it has on vectorisation across TDMA solves/etc (although given the explicit use of OMP SIMD in the CPU code atleast this should, hopefully not be impacted).
I remember some tests on 2D TGV with similar results at machine precision for low-level and high-level optimization using gfortran. However, I am afraid this kind of test can change depending on the machine, the compiler, the compiler version, ...
So how do we feel about setting the default value to just -O3
, so a new user compiles with no potential numerical issues (from fast-math at least...) but we document in the user guide & in the CMakeLists.txt how to change it with info on potential issues & performance gains.
OR we could keep fast-math on and print a warning when compiling & running telling the user to check their results or turn opts down to -O3
.
:+1: for option 1, :-1: for option 2, :eyes: for a different suggestion.
Having below in the CMake file
set(CMAKE_Fortran_FLAGS_FAST "-O3 -ffast-math")
and then using -DCMAKE_BUILD_TYPE=fast
when building worked for me. Does anyone know if this is something CMake supports or I'm being luck somehow, or missing something? If this is okay it will allow people to go ahead with their preference regarding -ffast-math
.
My main concern is the vectorisation as @pbartholomew08 mentioned because it makes a meaningful difference, not just a few percent, and it can be tricky to get the compiler to generate the instructions especially for AVX512, but it'll be easy to spot. Hopefuly compiler will vectorise even without -ffast-math
, and I'm pretty sure it did in my earlier tests. However, [1] mentiones the potential vectorisation issues for SSE and SSE2 in particular, and I'm sure it goes all the way up to AVX512 too. If it doesn't vectorise in some cases, we may consider adding a specific flag to enable vectorsiation in the release
build, so that performance impact of not having -ffast-math
will be minimal while not letting anyone getting scared. I'll probably go ahead with fast
build most of the time and do occasional checks just out of curiosity, reporting anything interesting.
Just noticed we're setting
-ffast-math
or-fast
in the release build. For a summary of why this is dangerous yet helpful, see this blog post. I haven't personally found it affects the correctness of fluid simulations, but I also haven't found it to improve performance so I tend to stick to just-O3
just in case.I think it's worth testing
-ffast-math
actually improves performance, and