Closed agalbachicar closed 3 years ago
This test seems to be failing not because of the implementation of the tests in the agent_simulation_builder_test.cc
file but because of some bad parsing in the message that prints the warnings
.(it is printed by a third library )
I couldn't reproduce this error and at the moment we run out of minutes in CI so I couldn't get to the bottom of the issue however I believe that if we resolve those warnings this error won't be shown anymore and to solve that we only have to change one line in the simple_prius.sdf
file. (Check change)
WDYT @agalbachicar if I add that minor modification and we check later on if it is needed to continue debugging or not?
WDYT @agalbachicar if I add that minor modification and we check later on if it is needed to continue debugging or not?
The change seems good to me. I doubt that is the only reason why the test failed. Let's start with that.
WDYT @agalbachicar if I add that minor modification and we check later on if it is needed to continue debugging or not?
The change seems good to me. I doubt that is the only reason why the test failed. Let's start with that.
Yeah, for what the issue description says technically the return code of the run_test.py
command is -11 which in unix mean SEGFAULT. At the moment I couldn't reproduce it locally. I am still trying though.
I was barely able to reproduce it, just a few times. (reducing the cpu usage of the container and doing a bit of magic).
I can tell two things so far:
run_test.py
(from /opt/ros/dashing/share/ament_cmake_test/cmake): I added some prints there just to check if it is an issue located in this script when executing the test. But no, the return code -11 is returned by the test(gtest)So I should investigate a bit and understand why the gtest tool is returning -11 when test is finished with no fails. An error when destroying the objects after the test ends?. Odd.
Extra: I've checked the test_result (build/delphyne/test_results/delphyne/UNIT_agent_simulation_builder_test.gtest.xml) after having the error and it shows:
<?xml version="1.0" encoding="UTF-8"?>
<testsuite name="delphyne" tests="1" failures="1" time="0" errors="0" skipped="0">
<testcase classname="delphyne" name="UNIT_agent_simulation_builder_test.gtest.missing_result" time="0">
<failure message="The test did not generate a result file: Running main() from /opt/ros/dashing/src/gtest_vendor/src/gtest_main.cc [==========] Running 5 tests from 1 test case. [----------] Global test environment set-up. [----------] 5 tests from AgentSimulationTest [ RUN ] AgentSimulationTest.TestGetVisualScene [ OK ] AgentSimulationTest.TestGetVisualScene (799 ms) [ RUN ] AgentSimulationTest.BasicTest [ OK ] AgentSimulationTest.BasicTest (1301 ms) [ RUN ] AgentSimulationTest.TestPriusSimpleCar -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 "/>
</testcase>
</testsuite>
Which is in essence is the same failure message shown in the issue's description.
I was barely able to reproduce it, just a few times. (reducing the cpu usage of the container and doing a bit of magic).
Consider commenting out tests to do a one vs all evaluation and find the test that has the segfault. Another alternative is to split tests across multiple buildable units.
I was able to reproduce the error by doing the following:
--cpus 0.1
command in order to use 10% of only one cpu.Just for not wasting time in other tests I commented out other tests and focused it on TestPriusSimpleCar
and TestPriusUnicycleCar
which they both had presented the aforementioned error.
By doing this I was able to enforce the error between 10 and 20 times (out of 300 runs.)
Once I could reproduce it in a "controlled" way I commented out the lines related to ignition
In particular I removed
test::IgnMonitor<ignition::msgs::AgentState_V> ign_monitor(kStateTopicName);
and all the lines related to the ign messages. ---> No error was thrown.
The next test I did was to just left the creation of the ignMonitor object:
test::IgnMonitor<ignition::msgs::AgentState_V> ign_monitor(kStateTopicName);
in the TestPriusSimpleCar
test.
---> run_test.py: return code -11
I guess that when the ign_montior
object's resources are meant to be deallocated the error comes up.
So now the problem is way more narrowed down than before.
Excellent investigation @francocipollone !
Update:
Context: IgnMonitor
class subscribes to a topic and attaches a callback method to count and save every new message.
The problem comes up when the resources are being deallocated,
It happens that, IgnMonitor
's object is destroyed and after this, the callback method is called which leads to a SEGFAULT.
I tried destroying the IgnMonitor object at the end of the execution and adding a node_.Unsubscribe()
in the destructor method
but the results are the same, the callback is still being called after the class is destroyed(and the node is unsubscribed)
I'll continue working on a solution
Solved. Further discussion if any should continue in #760
Found the error in CI --> https://github.com/ToyotaResearchInstitute/dsim-repos-index/pull/169
The extract of the log follows:
It is strange but the test passes with clang.