Open matt-attack opened 6 years ago
The short answer is that I would not expect it to be thread-safe. Unfortunately rclcpp hasn't gone through an API review yet and still lacks clarity on a lot of these kinds of details and therefore also lacks documentation on these points.
However, if you look into the functions that rclcpp
is ultimately calling in rcl
, there are several which are explicitly marked as not thread-safe, e.g. rcl_guard_condition_init
/rcl_guard_condition_fini
(especially if rcl_trigger_guard_condition
is called during rcl_guard_condition_fini
) or even just rcl_node_init
.
Now, in each case, those rcl
functions may be thread-safe in certain circumstances so the solution may not be as drastic a putting a global mutex lock around all the rclcpp versions of those functions, but the time has not yet been taken to figure out where it is thread-safe and where it is not and what to do about it in rclcpp to ensure that it used in a safe way.
There's some work coming up where we will audit the memory and thread behavior of these functions and the underlying rmw functions too, so we'll hopefully be able to provide better behavior and documentation soon. In the meantime we'd certainly like help fixing issues narrowly where ever you find it convenient.
@wjwwood Is there any chance it is caused due to deadlock in Fast-RTPS threads ? It seems there is a issue on Fast RTPS (https://github.com/eProsima/Fast-RTPS/issues/190) which may relate to this.
@sagniknitr I have seen that issue come up as well, so it's very possible its at least related
Trying to create then destroy many nodes at the same time causes a wide variety of different crashes. It seems that either creating or destroying a node is not thread safe.
Here's a minimal example that crashes almost every time for me:
Often I will get
Sometimes I get:
Other times:
I have been seeing lots of bugs/crashes that seem related to a race condition when there are multiple nodes in a process, but this is the most basic and reproducible example.