ros / ros_comm

ROS communications-related packages, including core client libraries (roscpp, rospy, roslisp) and graph introspection tools (rostopic, rosnode, rosservice, rosparam).
http://wiki.ros.org/ros_comm
747 stars 914 forks source link

Huge network delay when sending message below the MTU size #2318

Closed vpoughonEZ closed 1 year ago

vpoughonEZ commented 1 year ago

The script below, when run inside the ros:noetic-ros-core-buster docker image, is running very slowly for some message sizes. The run time as a function of message size in bytes (the script first argument) is:

1 1s
10 1s
100 1s
500 1s
600 12s
1000 12s
10000 12s
50000 12s
65466 12s
65467 1s
70000 1s
100000 1s
200000 1s

After some testing, it seems related to the network interface MTU size. The default MTU size of the localhost interface is 65536, and the upper threshold of the problem is 65466 which leaves a few bytes for headers.

Changing the MTU size with for example ip link set dev lo mtu 1500 changes the runtime of the script as a function of the message size, and indicates that the problem occurs when the message size is below the MTU (but above some mimum value around ~500).

import rospy
import threading
from std_msgs.msg import UInt8MultiArray
import sys

class Subscriber:
    def __init__(self):
        self.subscriber = rospy.Subscriber("/rospytest", UInt8MultiArray, self.callback)
        self.buffer = []
        self.wait_event = threading.Event()
        self.wait_event_buffer_size = None

    def callback(self, msg):
        self.buffer.append(msg)
        if len(self.buffer) == self.wait_event_buffer_size:
            self.wait_event.set()

    def wait_buffer(self, n, timeout):
        self.wait_event.clear()
        self.wait_event_buffer_size = n
        if len(self.buffer) == n or self.wait_event.wait(timeout):
            return self.buffer
        return False

    def clear_buffer(self):
        self.buffer.clear()

class Publisher:
    def __init__(self):
        self.publisher = rospy.Publisher("/rospytest", UInt8MultiArray, queue_size=1)

    def publish(self, number_of_bytes):
        msg = UInt8MultiArray()
        msg.data = bytes([0x42]*number_of_bytes)
        self.publisher.publish(msg)

if __name__ == "__main__":

    number_of_messages = 300
    number_of_bytes = int(sys.argv[1])

    print(f"Starting test with buffer of size {number_of_bytes}")

    sub = Subscriber()
    pub = Publisher()

    rospy.init_node('rospytest')

    for i in range(number_of_messages):
        if i % 10 == 0:
            print(i)

        # Send
        sub.clear_buffer()
        pub.publish(number_of_bytes)

        # Receive
        assert sub.wait_buffer(1, timeout=1)
        sub.clear_buffer()
flixr commented 1 year ago

Did you also try this with tcp_nodelay=True? http://wiki.ros.org/rospy/Overview/Publishers%20and%20Subscribers

vpoughonEZ commented 1 year ago

Yes, thank you. It appears to fix the problem when used on the Subscriber. It has no effect when used on the Publisher. I did not know about Nagle's algorithm, very interesting!

peci1 commented 1 year ago

Can this issue be closed?

vpoughonEZ commented 1 year ago

Yes for me it's ok, thanks.