Multiple micro-ROS agents causes data loss

Benblob688 commented 2 years ago

Issue template

Hardware description: Nvidia Jetson running ubuntu 20.04
RTOS: Teensy 4.0
Installation type: docker agent
Version or commit hash: foxy

Steps to reproduce the issue

I use the following commands to start up a micro-ROS agent (never gone deeper than this): sudo docker run -it --rm -v /dev:/dev --privileged --net=host microros/micro-ros-agent:foxy serial --dev /dev/serial/by-id/<device-id> -v6 && export ROS_DOMAIN_ID=20

One device plugged in, one agent running, one device's data received normally Two devices plugged in, one agent running, one device's data received normally Two devices plugged in, two agents running, two device's data received intermittently & issue arises:

Expected behavior

Should be able to run multiple agents for multiple devices plugged in.

Actual behavior

There are occasional freezes to the incoming data, and these freezes are synchronised between the devices. We then also get an initial value which is far from the expected value (temperature, pressure etc.) see attached screenshots.

Additional information

I get the impression that multiple devices should be able to run off one micro-ROS agent, but can't find any guidance or examples on this. Can two agents running interfere with each other? Can the Jetson be CPU-overwhelmed by the overhead of running two agents? Is setting the DOMAIN_ID the same on two simultaneous agents a problem? Is there a simple way to modify the agent docker command above to listen to multiple devices?

Acuadros95 commented 2 years ago

Hi @Benblob688, You can use the multiserial agent directly, avoiding multiple agent instances with unnecessary overhead: sudo docker run -it --rm -v /dev:/dev --privileged --net=host microros/micro-ros-agent:foxy multiserial --devs "/dev/1 /dev/2 ..." -v6

Also, you dont need to set ROS_DOMAIN_ID on the Agent, this is configured on the micro-ROS client side. In fact, a single Agent can handle communications on multiple domain IDs: Check this example link or this tutorial link

Benblob688 commented 2 years ago

Hi @Acuadros95, I used the command as you suggest, and the screenshot is below. It does find the devices but I don't get the rapid printout of a single connection, nor do the ros topics list. The agent stops at the initialisation stage. Interestingly, it did not care when I unplugged the devices. Nothing changed, no errors. I tried with and without the export ROS_DOMAIN_ID with no change, as you explained. Screenshot from 2022-05-17 15-38-57

Acuadros95 commented 2 years ago

Interestingly, it did not care when I unplugged the devices

What do you mean with this? You can start the agent with the serial devices disconnected (The file is not present on /dev/...) and the Agent should wait for the connection.

Can you try that? Start the multiserial agent with both ports disconnected, then connect them one by one while checking the log.

What does your micro-ROS app look like? If you don't handle the Agent presence manually, the Agent should be running when the board is connected and powered on, check this example to see what I am talking about: micro-ros_reconnection_example

Benblob688 commented 2 years ago

I mean that on theserial version of the docker command, unplugging the devices would result in a server stopped and then repeated serial port not found messages every second. In this case with multiserial version of the docker command, I can unplug the devices and no error codes occur, no print statements come out. For example, in the screenshot attached, I followed your request to start with none plugged in, then one, then both. I then unplugged both. The screenshot is taken with no devices attached, but it has not reverted back to the serial port not found behaviour that I expect to see. Screenshot from 2022-05-18 10-15-38

in terms of the micro-ROS app, we haven't implemented it in this way, we followed this style: https://github.com/micro-ROS/micro_ros_arduino/blob/galactic/examples/micro-ros_publisher/micro-ros_publisher.ino do you think this could be the cause of the multiserial command getting stuck on the Serial port running... step?

Acuadros95 commented 2 years ago

I could not replicate on our side using micro-ros_publisher.ino code and 2 Teensy 4.1.

Can you try with this exact example code?
Are you using v2.0.4-foxy release?
Also make sure you have the latest agent image: docker pull microros/micro-ros-agent:foxy
Can you run the image without sudo?

Acuadros95 commented 2 years ago

Any updates on this?

Acuadros95 commented 2 years ago

Closing, feel free to reopen if the problem is not solved.

Benblob688 commented 2 years ago

Hi again, we've had some time to try out a few things and have more info. It's still an issue, I can't reopen this though, I don't have permission.

Can you try with this exact example code?

Haven't tried this yet. See below, I think we can discount this theory.

Are you using v2.0.4-foxy release?

I'm not sure how to see which sub-version of foxy is running on our machines, but the docker container should negate this anyway? Because the docker container has ros inside of it, and doesn't necessarily care about whether ros is installed on the machine or not?

Also make sure you have the latest agent image: docker pull microros/micro-ros-agent:foxy

Correct, latest agent image is pulled.

Can you run the image without sudo?

Permission denied.

We managed to get the `multiserial` command working on 3 out of 6 machines but it seemed hardware-specific: OS	hardware	multiserial runs?
Ubuntu 20.04.4 LTS Focal	NVIDIA Jetson ARM CPU	No
Ubuntu 20.04.4 LTS Focal	RPi 4 ARM CPU	No
Ubuntu 20.04.4 LTS Focal	RPi 4 ARM CPU	No
Ubuntu 20.04.4 LTS Focal	Tuxedo Pulse 15 AMD CPU	Yes
Ubuntu 20.04.4 LTS Focal	Tuxedo Pulse 15 AMD CPU	Yes
Ubuntu Mint 20.1	MSi Intel i7 CPU	Yes

So to recap:

serial works on all devices.
multiserial with one device works only on the three machines listed in my table above.
multiserial with two or more devices works only on the three machines listed in my table above.

Therefore it seems that the RPi and Jetson (both ARM hardware) can do only serial, and the multiserial freezes after startup.

This makes me think that the code we have running on the Teensy is not the issue, because we have now got it working in some cases, dependent on which computer is running the docker container.

What hardware are you running your multiserial on? Do you have access to a RPi to try the command on to reproduce the error we are seeing? Or do you have any other suggestions/ideas as to what could make it not work on a RPi/Jetson (both run ARM CPU's?)

Could you point us to the source code for multiserial or to see inside the docker container? As in the screenshots above, it freezes on the RPi/Jetson running MultiTermiosAgentLinux.cpp or root.cpp. These are the only two bits of code I can see referenced by the docker container.

Thanks!

Acuadros95 commented 2 years ago

After debugging on a RPi: the Agent was not stuck, it just couldn't read data properly using select()

select() behavior differs between architectures:
- RPi: Returns EINVAL errno for timeout.tv_usec >= 1 second,
- AMD laptop: Works fine for same value.
Regular serial agent works because it uses poll() instead of select(), avoiding this bug.

This PR should fix the problem: https://github.com/eProsima/Micro-XRCE-DDS-Agent/pull/311

Will update when the fixed ARM dockers are ready.

Acuadros95 commented 2 years ago

Dockers updated!

Run docker pull microros/micro-ros-agent:foxy and try again

Benblob688 commented 2 years ago

Brilliant, that is now solved, thank you!

Hubery-wl commented 11 months ago

Hello, I'm a novice, and I've been using multiple micro-ros recently. Can you share the following how you realized multiple micro-ros, or provide some information? I'd be very grateful. Thank you.

micro-ROS / micro-ROS-Agent