ros-perception / image_common

Common code for working with images in ROS
http://www.ros.org/wiki/image_common
129 stars 223 forks source link

Low message read rate on other device and excessive message size with tutorial publisher #268

Closed joshuaoreilly closed 1 year ago

joshuaoreilly commented 1 year ago

Setup

Two devices, Intel NUC (NUC) and Jetson AGX Orin (Orin), connected directly via ethernet.

Both are running Ubuntu 20.04 and ROS 1 Noetic.

Problems

  1. Subscribing to a topic to which images are being published receives messages at the publish rate if on the same device, but receives messages at a much lower rate if subscribing from another device.

  2. Message is significantly larger than the image being sent

Reproduction

Run iperf and ping to get bandwidth and latency/lost package ballpark

On NUC: iperf -s On Orin: iperf -c 192.168.0.ipofnuc

Output:

------------------------------------------------------------
Client connecting to 192.168.3.1, TCP port 5001
TCP window size: 2.47 MByte (default)
------------------------------------------------------------
[  3] local 192.168.3.2 port 57688 connected with 192.168.3.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  2.74 GBytes  2.36 Gbits/sec

(295MB/s)

Running ping 192.168.0.ipofnuc from Orin gave no lost packets, 1.5ms highest latency

Double-check image size

Running ls -lh small.png returns

-rw-rw-r-- 1 owner owner 81K Aug  8  2022 small.png

Create basic synthetic image transfer test

I followed the instructions here and here to set up my workspace and get the image_transport_tutorial code. In /home/cvg/image_transport_ws/image_common/image_transport/tutorial/src/my_publisher.cpp, I modified the loop_rate() to be 30 instead of 5, but it is otherwise unchanged. Compiled with catkin_make.

Start roscore on NUC, synthetic test on Orin

On NUC, run roscore

On Orin, export ROS_MASTER_URI=http://192.168.0.ipofnuc:11311 Run rosrun image_transport_tutorial my_publisher ~/small.png where small.png is an 81KB image

Test hz and bw on Orin and NUC (separately, one after the other)

rostopic hz /camera/image on Orin shows 30Hz rostopic bw /camera/image on Orin shows 73MB/s (bmon shows 70MB/s) with mean message size of 2.43MB

rostopic hz /camera/image on NUC shows 21HZ rostopic bw /camera/image on NUC shows 50MB/s (bmon shows 50MB/s on both devices) with mean message size of 2.43MB

Despite the full bandwidth of the connection between them being 295MB per second, and the required bandwidth to send 30 2.43MB messages per second being 73MB/s, subscribing to the /camera/image topic only receives 21 messages per second.

I also increased the loop_rate() to 60 and found a similar result; the network is not saturated, but subscribing to the topic from the NUC results in reading less than 60 messages per second.

An adjacent issue is that each message is 2.43MB, while each image is only 81KB; what's using up the other 2.35MB per message? And it is possible to reduce it?

joshuaoreilly commented 1 year ago

This is not a legitimate issue with image_common, but with how the network is set up; publishing the images from the NUC to either the Orin or a separate workstation solves the problem. I suspect this is because the Orin is connected directly to the NUC via ethernet, and this configuration has some network tomfooleries slowing it down. Regardless of the root cause, it's not an image_common problem. Feel free to delete this Issue.

joshuaoreilly commented 1 year ago

The "excessive message size" is caused by .png/.jpg files being compressed, vs their numpy representation not being compressed; loading an image with img = cv2.imread('image.jpg') then running img.nbytes will show a 10x-ish size increase.