Closed jewettaij closed 5 years ago
The problem actually had nothing to do with the "-g" flag or OpenMP or multithreading (whew). Instead the problem occurred when compiled using optimizations (using the the -O1, -O2, or -O3 compiler flags) with GCC.
I removed these flags from the compiler options (currently located in the "setup_gcc.sh" files). This enabled me to go back to using OpenMP, but it still resulted in performance that was roughly 4x slower than before.
For now, using the CLANG compiler instead of GCC seems to totally fix this problem. (The resulting binaries seem to be about 20% faster too.)
Incidentally, I ran valgrind on "filter_mrc" using" valgrind --tool=memcheck --leak-check=yes --show-reachable=yes --num-callers=20 --track-fds=yes filter_mrc ... and it did not find any errors. So that does not appear to be the source of the problem. I'll keep playing with valgrind's other tools to see if I can track this down.
It's possible there is a bug in the code, but it also could be a compiler glitch. (It would not be the first time I encountered one in gcc.) For now, I'm going to paper-over this problem by moving to CLANG.
More thorough checking with other valgrind tools failed to discover any problems. Perhaps this is copping out, but I'm leaning towards calling this a bug in GCC optimization. (Again, it would not be the first glitch I've run into with GCC. The old pre-3.0 compilers were a nightmare.) Either way, it would be nice if this code worked on all compilers. If I have time, I'll try tinkering with the code in "ClusterConnected()" to try and coax this code into behaving nicely with GCC. For now, use CLANG.
Since upgrading my compiler from gcc version 7.5 to gcc 9.3, this problem seems to have mysteriously corrected itself.
(Incidentally, the function where the problem occurred has been renamed from "ClusterConnected()" to "LabelConnected()". I don't know if the change in the compiler, or small changes in the code since 2019 could have fixed the problem, or whether I really have fixed the problem.)
"Heisenbug" ? The "-connect* argument of "filter_mrc" (which invokes the "ClusterConnected()" function in filter3d.hpp), is behaving strangely, but only when compiled in gcc with optimizations and OpenMP enabled.
If you compile it using the settings located in "for_debugging_and_profiling/setup_gcc_linux_dbg.sh", (which uses the -g3 flag), then these problems go away. This is a serious bug, partly because running the code without OpenMP makes it almost intolerably slow. I will look into this soon.