Open Schwo0ps opened 5 years ago
Bit of a necro, but this looks like a gcc bug, the same as https://github.com/ros/ros_comm/issues/2197 & https://github.com/ros/roscpp_core/issues/130
If you need to use -O3 then I think your only option is to update your version of gcc.
PR https://github.com/ros/roscpp_core/pull/136 should fix this. I'm looking for someone who could verify. Just please notice that Focal now has GCC 9.4 by default where I could not reproduce the issue. So the test would need to be done with GCC 9.3 installed explicitly.
The issue is a complicated one, but here goes.
I first noticed this issue when I saw that diagnostics messages from arm64 machines sometimes arrive, but only infrequently (between 0% and 10% of the time), and eventually the following message comes from the Diagnostic Aggregator running on amd64 and it appears all messages are dropped.
At first, I though it was Endianness, but all machines are Little Endian. There are also C++ nodes which are able to communicate with each other properly on all machines. It also appears as though this only happens with the Diagnostic Updater, not any other topic.
After this, I started running a test: just running roscore and a single test node with the following code.
Everything works fine with amd64, but on arm64, the above issues happen. Additionally, the node just eventually crashes with
std::bad_alloc
. Here is the backtrace and relevant message that was published (as it looks like the error was with serialization)It crashes after ~30 seconds but seems to do so more quickly if multiple of the nodes are running.
As specified in the title, this only happens with GCC optimization level 3 (compiling with
-O3
orCMAKE_BUILD_TYPE=Release
). I tried with-O2
and it appears to work fine.I doubt this can be easily fixed and I'm not 100% sure if the issue is within this repo or ros_comm, but any ideas would be greatly appreciated. We really would like to use
-O3
throughout our code for improved performance.