Open YuanYuYuan opened 2 months ago
@YuanYuYuan could you provide steps to reproduce the error?
With the patch above, are you suggesting that it is ok to not close the zenoh session? Wouldn't that lead to memory leaks?
@clalancette should we try moving this block back into rmw_context_fini
?
@clalancette should we try moving this block back into
rmw_context_fini
?
We need to take a close look before making changes here again. I've moved it twice already for 2 different bugs, so we need to go back, look at the contexts these are being called in (for both rclcpp and rclpy), and see what the previous bugs were.
@YuanYuYuan could you provide steps to reproduce the error?
With the patch above, are you suggesting that it is ok to not close the zenoh session? Wouldn't that lead to memory leaks?
@clalancette should we try moving this block back into
rmw_context_fini
?
Please follow the scripts written by @evshary.
We notice that ApexAI performance test with _rmwzenoh keep failing with the error.
This happens since we call
z_close
within__run_exit_handlers
. We had concluded that termination of tokio runtime within theatexit
handler is unsound and leads to undefined behavior. See this and this. Basically, the termination during exit stage is beyond what tokio runtime covering. Rust is designed not to drop any static variable until the end of the program. Calling destruction in exit handlers is also error-prone even in C/C++.On the other hand, rmw_cyclonedds seems not to use any explicit termination during the exit stage. https://github.com/ros2/rmw_cyclonedds/blob/c6dbe24b2f2be87cf8e4750d89501657ab83566f/rmw_cyclonedds_cpp/src/rmw_node.cpp#L1629. I've ran the same test with the following patch. The error disappears and the rmw_zenoh seems fine.
In conclusion, we have a few solutions to it,
exit
phase.TBH, I'm not sure if 2. and 3. are feasible in ROS RMW. Any comment is welcome! :slightly_smiling_face: