ros2 / rmw_cyclonedds

ROS 2 RMW layer for Eclipse Cyclone DDS
Apache License 2.0
108 stars 89 forks source link

messages arrive at subscriber in 1ms intervals #352

Closed berndpfrommer closed 2 years ago

berndpfrommer commented 2 years ago

Bug report

Required Info:

Expected behavior

Messages arrive about every 1/6000s

Actual behavior

Multiple messages (about 6) arrive every 1ms. No messages are lost.

Additional information

This may not be a bug per se, I just couldn't find any documentation explaining the behavior. Is there a configuration parameter that needs to be tuned to increase the frequency? Is the 1ms aggregation a kernel/networking issue or a cyclonedds feature?

eboasson commented 2 years ago

Apologies for letting this slip by @berndpfrommer. From how Cyclone is constructed, I can tell you that in the absence of packet loss, the samples will be processed individually at 6kHz if the kernel promptly schedules all threads that become runnable (and with 3 active threads — publishing, network receive, application receive — on a modern machine, it should be able to).

I would suspect the scheduling of the timer on the sender and the scheduling of the application receiver more than anything else.

In any case, I think your test is not unlike my the hacked ROS 2 demo code I sometimes use: talker64 and listener64. This gives me:

# ./listener64&./talker64
[1] 41791
[INFO] [1637305541.619449000] [listener]: rcvd 0 lost 0 minlat 18446744073709552.000us avglat nanus maxlat 0.000us
[INFO] [1637305542.619514000] [listener]: rcvd 5989 lost 0 minlat 16.000us avglat 18.905us maxlat 913.000us
[INFO] [1637305543.619680000] [listener]: rcvd 5989 lost 0 minlat 16.000us avglat 17.724us maxlat 212.000us
[INFO] [1637305544.619681000] [listener]: rcvd 5988 lost 0 minlat 14.000us avglat 17.924us maxlat 100.000us
[INFO] [1637305545.619686000] [listener]: rcvd 5988 lost 0 minlat 14.000us avglat 17.526us maxlat 55.000us
[INFO] [1637305546.619869000] [listener]: rcvd 5989 lost 0 minlat 15.000us avglat 17.926us maxlat 111.000us
[INFO] [1637305547.619859000] [listener]: rcvd 5988 lost 0 minlat 15.000us avglat 17.783us maxlat 103.000us
[INFO] [1637305548.620027000] [listener]: rcvd 5989 lost 0 minlat 15.000us avglat 17.765us maxlat 94.000us
[INFO] [1637305549.620022000] [listener]: rcvd 5972 lost 0 minlat 14.000us avglat 18.016us maxlat 101.000us
[INFO] [1637305550.620190000] [listener]: rcvd 5989 lost 0 minlat 9.000us avglat 17.750us maxlat 136.000us
[INFO] [1637305551.620358000] [listener]: rcvd 5989 lost 0 minlat 16.000us avglat 17.399us maxlat 49.000us
[INFO] [1637305552.620526000] [listener]: rcvd 5989 lost 0 minlat 14.000us avglat 17.308us maxlat 68.000us
[INFO] [1637305553.620692000] [listener]: rcvd 5989 lost 0 minlat 11.000us avglat 17.456us maxlat 67.000us
[INFO] [1637305554.620879000] [listener]: rcvd 5989 lost 0 minlat 14.000us avglat 17.320us maxlat 51.000us
[INFO] [1637305555.620879000] [listener]: rcvd 5988 lost 0 minlat 15.000us avglat 17.365us maxlat 90.000us
[INFO] [1637305556.620894000] [listener]: rcvd 5988 lost 0 minlat 15.000us avglat 17.530us maxlat 58.000us
[INFO] [1637305557.621035000] [listener]: rcvd 5989 lost 0 minlat 15.000us avglat 17.520us maxlat 62.000us
[INFO] [1637305558.621037000] [listener]: rcvd 5988 lost 0 minlat 15.000us avglat 17.651us maxlat 189.000us
^C[INFO] [1637304870.924921000] [rclcpp]: signal_handler(signal_value=2)
# kill %
[1]  + 41791 terminated  ./listener64                                                                            

In general I would prefer a bit more effort on the actual latencies (these are taken over 1s intervals) but I think it is fair to say that there is no evidence of "batching" (scare quotes because the term also gets used in more specific ways in some DDS implementations) at all.

Now this is on a MBP running macOS 12.0.1 so clearly it will exhibit different timing behaviour, but it does support my claim that it isn't like 1ms worth of messages gets delivered at a time by design. This is the default configuration, by the way.

berndpfrommer commented 2 years ago

OK, apologies, I got sidetracked and no longer have the necessary variables in my memory :( Since I didn't create a clean test case I can no longer reproduce the behavior easily. Next time I hit this I'll have to document better on how to reproduce this.