Open YuanYuYuan opened 4 weeks ago
After checking code, there is a issue.
Thread 1
Before calling rcl_shutdown(), g_contexts_mutex
is locked.
Thread 2
Before calling rcl_shutdown(), g_contexts_mutex
isn't used.
If thread 2 executes rcl_shutdown first, thread 1 can still call rcl_shutdown again.
So rcl_shutdown should be safeguarded by g_contexts_mutex in Context::shutdown().
I create a fix https://github.com/ros2/rclpy/pull/1353. Could you test in your environment to see if the issue no longer occurs with this PR ?
Hi @Barry-Xu-2018! Thanks for your attempt. But it seemed not to work...
My fixing only prevented simultaneous calls to rcl_shutdown(). Another problem is that repeated calls to shutdown() cause an exception.
I have updated fixing. Please try again.
Currently, rclpy is designed to throw an exception if rcl_shutdown() is called multiple times on the same context (there's a specific test case for this). So, the error is expected behavior. However, in your situation, it shouldn't throw an error, but rather ignore the second shutdown call.
I will discuss with other members on how to fix this problem.
Hi @Barry-Xu-2018, I have confirmed your latest fix resolves the issue (no more duplicated rcl_shutdown
). Thanks!
Bug report
We found this sporadic failure with rclpy (rolling) due to the race condition while calling
rcl_shutdown
. In this issue, the conflict happens ifrmw_shutdown
is slow so thatrcutils_atomic_store
on Thread 1 is set after the checkrcl_context_is_valid
on Thread 2. Thereforercl_shutdown
would be called twice and cause an error.Analysis
Thread 1
Thread 2
Required Info:
rmw_shutdown
could lead to this issue.