ros2 / rmw_zenoh

RMW for ROS 2 using Zenoh as the middleware
Apache License 2.0
142 stars 29 forks source link

Shared-memory with containers #213

Closed ciandonovan closed 1 week ago

ciandonovan commented 1 week ago

When running zenohd, a ROS 2 publisher and a ROS 2 echo in a container, shared memory is seemingly used, with little traffic on the loopback interface.

However, with the ROS 2 echo subscriber in a different container, I see hundreds of megabytes of traffic on the loopback interface, implying shared memory is not being used. This is despite the containers being configured with --ipc=host to share the host's IPC namespace, and /dev/shm/ being bind-mounted into both.

Is there some heuristic that determines if both processes are running on the same physical host in order to trigger shared memory usage that's being confused by the face they're in containers? There was a similar issue with FastDDS where they used the network interface names to determine if the processes were in the same host, which was the case with --net=host, but that triggered shared-memory to be used, which wouldn't of course work without --ipc=host, a less common option to have enabled.

I have shared-memory configured in both the router and session JSON configs, and am using unprivileged rootless Podman containers. Is there a way to explicit logs as to what type of IPC is being used and why?

Yadunund commented 1 week ago

SHM is disabled by default in rmw_zenoh due to some known issues See https://github.com/ros2/rmw_zenoh/issues/199

Regarding your observations, I'd try to reproduce the same results with the zenoh-c examples and open a ticket upstream if you see the same behavior.

ciandonovan commented 1 week ago

Issue #199 says it's disabled by default in session config, however I manually enabled it there as well as the router config. It seems to work correctly when in the same container, it's just the inter-container use-case that it wouldn't work.

  1. Is there a way to get more verbose information on the IPC methods chosen by rmw_zenoh in runtime?
  2. Do you know what by what heuristic shared-memory is used or not? Does it attempt and fall back to the network stack?
  3. Does it make sense to pursue this now, or wait until https://github.com/eclipse-zenoh/zenoh-c/pull/405 gets merged into rmw_zenoh?
ciandonovan commented 1 week ago

Regarding your observations, I'd try to reproduce the same results with the zenoh-c examples and open a ticket upstream if you see the same behavior.

The zenoh-c examples (z_pub_shm.c & z_sub.c) seem to work fine intra and inter-container, with shared memory segments opened by them in /dev/shm/. However I can't rule out that they might be using the loopback interface, any advice in ensuring that's not the case? Is there a way to ensure they use shared-memory exclusively? Because running the example in a separate container with no --ipc=host flag set also works, so I assume that must have fallen back to the loopback interface?

ciandonovan commented 1 week ago

The issue was that the /tmp/ directory wasn't shared, and rmw_zenoh uses that to coordinate the shared-memory negotiation on the same host, bind mounting it into both containers fixed the first issue and allowed shared-memory communication between the nodes in different containers.

However, when the subscriber shuts down, even gracefully, the publisher crashes with:

what():  failed to publish message: Failed to allocate a SHM buffer, even after GCing, at /opt/ros/akara_ws/src/rmw_zenoh/rmw_zenoh_cpp/src/rmw_zenoh.cpp:949, at ./src/rcl/publisher.c:284
ruffsl commented 3 days ago

@ciandonovan , thanks for publishing your findings. Haven't migrated to zenoh yet, but this'll be something to watch for.