Closed ddj116 closed 2 months ago
Wondering if you noticed since which Trick update causing the stated issue and it was running fine before?
valgrind
works on CentOS 7.9 with gcc 4.8.5 and gcc 8.3.0 - we are using 90997929477bac02022271cd0e8a55e13fe4251c from March 2023 for that platform.
As part of the Rocky 8 upgrade we were forced to upgrade Trick to the latest release because of clang 16. The above details were replicated at commit 0db42a101292aa9081d7f997ed47a08ce433a9a1 from March 8 2024. On those Rocky 8 systems we're using the system gcc
8.5.0 with clang
16.0.6 and python 3.6.8.
If y'all have been testing Rocky 8 since 90997929, meaning you know it's stable at that old state, hypothetically you could git bisect
with these two commits to see where in history this error was introduced, using the output of valgrind
on a test sim as the success criteria.
@ddj116 I was able to reproduce this problem in Trick and also with the following sample code, using Python 3.6.8 and gcc 8.5.0 on RHEL8:
#include <Python.h>
#include <iostream>
int main()
{
Py_Initialize();
std::cout << "Dan, how would you suggest we fix this in Trick, a problem which exists entirely outside of Trick? Regale us with your wisdom." << std::endl;
Py_Finalize();
return 0;
}
>> g++ -o DanHasAProblemButWeDont.o $(python3-config --cflags) $(python3-config --ldflags) DanHasAProblem.cc
>> valgrind -v --leak-check=full --error-limit=no --gen-suppressions=all --error-exitcode=234 DanHasAProblemButWeDont.o
>> gdb DanHasAProblemButWeDont.o vgcore.*
Please try it out and let us know your thoughts.
After a screen share session today we've learned that, although we have identical gcc
and valgrind
versions on the (mostly same) OS RHEL/Rocky 8.9, I do not get a vgcore.*
file for this exact same test setup. I can only assume that gremlins and/or cosmic rays are involved. This isn't a huge deal for us, I was really just documenting what I've found. We will likely abandon valgrind
in favor of asan
anyhow.
In summary:
Just wanted to note here for those that might come across the later that the above information applies to valgrind --tool=memcheck
which is the default tool for valgrind. I just ran
valgrind --tool=callgrind` and had no issues in the example sim.
Platform details
Running in the FSL on Rocky 8.9 with python 3.6.8 and gcc 8.5.0. Trick was built with
./configure && make
. Systemvalgrind
is version 3.21.0.How to replicate
Build
trick/trick_sims/Ball/SIM_ball_L1
with normaltrick-CP
process. Then runvalgrind -v --leak-check=full --error-limit=no --gen-suppressions=all --error-exitcode=234 ./S_main_Linux_8.5_x86_64.exe RUN_test/input.py
. The bottom of the output will show:And a new
vgcore.*
file will be dropped in the SIM directory.vgcore file details
Looking at the stack via:
gdb S_main_Linux_8.5_x86_64.exe vgcore.*
we see:More information
This appears to be new in our conversion from Trick CentOS 7.9 (gcc 4.8.5) to Rocky 8.9 (gcc 8.5.0). I was able to replicate this inside of
trick_sims/Cannon/SIM_cannon_aero
as well, so I assume it's present for all Trick sims but I have not tested any more example sims. I have heard of one other group that is also encountering this.