ros2-java / ros2_java

Java and Android bindings for ROS2
Apache License 2.0
172 stars 93 forks source link

Huge message lost during Pub/Sub #168

Closed xingjl6280 closed 3 years ago

xingjl6280 commented 3 years ago

Hi team,

Here's my case: ProjectA initiates 50 threads, each thread keeps publishing string message into same topic with 50ms interval. All threads are using a singleton node instance which holds a publisher instance. ProjectB subscribes from the topic.

** Here's my test result, all done with 2 projects launched in one PC Test1: Message size: 18 chars Message lost: 4k of 50k lost, nearly 10%

Test2: Message size: 1000 chars Message lost: 50% for 50k

Test3: Create node(with different node name) and publisher for each of the 50 threads and all pub msg to the same topic Message size: 18 chars Message lost: 500 of 50k lost, around 1% **

Here's my setup: Ros2 version: dashing RCLJava version: dashing branch checked out few days ago, build with colcon Java version: openJdk11 OS: Ubuntu18.04 64bit

Here's my source code SourceCode.zip

I've tried to modify QoS parameters or increase the publish interval to 100ms, no observable effect.

Is the node instant and publisher instant thread safe? Or I need to optimize some config? Please kindly advise.

jacobperron commented 3 years ago

I'm pretty sure that the Node and Publisher objects provided by rcljava are not thread-safe, so you should take care of thread-safety in your project code.

For reliability of message transition, you should make sure that QoS of your publishers and subscriptions has a reliability setting of reliable and sufficiently large history depths. Still, performance may vary depending on the RMW. I recommend trying again with the latest changes on main (ROS Galactic) and with different RMWs. Here's some documentation about changing the RMW: https://docs.ros.org/en/galactic/How-To-Guides/Working-with-multiple-RMW-implementations.html

You can also find some info on tuning particular RMW implementation here: https://docs.ros.org/en/galactic/How-To-Guides/DDS-tuning.html

I hope it helps!

Since I don't suspect there is a bug in rcljava specifically causing this issue, I'm going to close this. If you believe I'm mistaken, feel free to comment and we can re-open the issue.