siemens / ros-sharp

ROS# is a set of open source software libraries and tools in C# for communicating with ROS from .NET applications, in particular Unity3D
Apache License 2.0
945 stars 364 forks source link

potential concurrency (multithreading) issues #446

Closed bmegli closed 10 months ago

bmegli commented 10 months ago

Current master 02388ba9321bb73af375fdcb01352786a44a3e85

If I understand correctly (and that is what I see when checking thread ids)

Taking TwistSubscriber subscribing through UnitySubscriber as example

Code Fragment

https://github.com/siemens/ros-sharp/blob/02388ba9321bb73af375fdcb01352786a44a3e85/com.siemens.ros-sharp/Assets/RosSharp/Scripts/RosBridgeClient/RosCommuncation/TwistSubscriber.cs#L37-L66

Problem

If ReceiveMessage is called from different thread this code has multiple concurrency issues.

  1. There is race between ProcessMessage running in Unity main thread and (e.g. next) ReceiveMessage modifying data at same time
  2. Dirty reads/writes - the data in vectors might be in half-changed state at the time ProcessMessage uses it (e.g. types are not atomic)
  3. There is no guarantee when Unity main thread will see the changes
    • there is completely no thread synchronization
    • there is no guarantee in memory model that Unity main thread sees current values (not cached)
    • there is no guarantee when and that it will see them

So it looks like implementation here may be UB (undefined behavior) working most of the time by pure luck (also related to Update called once per frame, making concurrency issues less likely to hit)

Questions

  1. Do I understand correctly that message (and service) handlers may be called from different threads?
  2. If yes, are those potential concurrency issues known?
MartinBischoff commented 10 months ago

Hi @bmegli

short answers: 1: yes, 2: no

I strongly believe that when the code is used as intented and as highlighted in the examples no concurrency issues can happen. In fact we never experienced any concurrency issues in any ROS# application or in any other co-simulation implemented in a similar pattern:

MartinBischoff commented 10 months ago

We could test for concurrency issues by reading a huge amount of data in ReceiveMessage which takes longer to read than one Update cycle. In case race conditions exist, we can think about working with locks in the respective Subscriber implementations - or provide a lockable container in the Subscriber class +for storing all data that is shared between the two threads.

bmegli commented 10 months ago

Hi @MartinBischoff

In fact we never experienced any concurrency issues in any ROS# application or in any other co-simulation implemented in a similar pattern

I would say to you didn't notice it or didn't attribute it to concurrency issue.

Example experiment showing concurrency issue

Based on PoseStampedSubscriber

Idea:

ROS side

# publish pose (1, 2, 3) and quaternion (0, 0, 1, 0) 10000 times per second
rostopic pub /pose_stamped geometry_msgs/PoseStamped "header:
 seq: 0
 stamp:
   secs: 0
   nsecs: 0
 frame_id: ''
pose:
 position:
   x: 1.0
   y: 2.0
   z: 3.0
 orientation:
   x: 0.0
   y: 0.0
   z: 1.0
   w: 0.0" -r 10000

Unity Side

Modification in PoseStampedPublisher

Unity settings:

        private void Update()
        {
            if (isMessageReceived)
            {
                //zero out so that we can see if
                //- values may be changed by differend thread
                //- while we process them
                position = new Vector3();
                rotation = Quaternion.Euler(30, 0, 0);

                ProcessMessage();
            } 
        }     
        private void ProcessMessage()
        { 
            PublishedTransform.position = position;
            PublishedTransform.rotation = rotation;

            //if below happens this means that position/rotation was overwritten by different thread
            //while we were in ProcessMessage
            //the worst case would be inconsistent values
            if (PublishedTransform.position != new Vector3() || PublishedTransform.rotation != Quaternion.Euler(30, 0, 0))
                Debug.Log("multithreading issue,"
                      + " pose is " + PublishedTransform.position + " "
                      + " rotation is " + PublishedTransform.rotation);

            isMessageReceived = false;
        }

Sample Output

Various combinations of output may be seen:

264611199-ddcb8e88-8ab6-48db-9d8b-8e59829f7738

SIde Note

PoseStampedSubscriber was missing isMessageReceived = false;

MartinBischoff commented 10 months ago

Thanks for elaborating this example @bmegli . Pose stamped is a good example why we never really cared about this in our typical applications: at low cycle times the changes in position and rotation are normally sufficently low such that we just take the latest data received. In case we want to ensure that positon and orientation match to the same message, we can add locks to the PoseStampedSubscriber as mentioned above.

bmegli commented 10 months ago

I never expericenced dirty read/writes and I suppose this is handled C# internally.

Without long discussion:

There are some guarantees from C# specifications and some more from CLI.

The bottom line is that without any means of synchronization the guarantees are very lean.

bmegli commented 10 months ago

We could test for concurrency issues by reading a huge amount of data in ReceiveMessage which takes longer to read than one Update cycle. In case race conditions exist, we can think about working with locks in the respective Subscriber implementations - or provide a lockable container in the Subscriber class +for storing all data that is shared between the two threads.

No worries. It's mostly on the user side I guess.

But users should be aware that message/service handlers are called in different threads and require synchronization.

Existing code like PoseStampedSubscriber is kind of example that users would follow and fall into very rare/difficult to debug multithreading issues.

bmegli commented 10 months ago

Funny example I can think of would be message like

# pseudocode
bool allow
bool movement
bool autodestruction

And sending allow movement, !allow autodestruction but getting allow autodestruction as a result of lack of synchronization (similar to experiment with PoseStampedSubscriber mixed output)