ros2 / rmw_fastrtps

Implementation of the ROS Middleware (rmw) Interface using eProsima's Fast RTPS.
Apache License 2.0
147 stars 116 forks source link

Messages dropped while multiple publishers are publishing,help with XML profile. #747

Open yashmewada9618 opened 4 months ago

yashmewada9618 commented 4 months ago

Description

How to write Fast RTPS XML profile to publish in unicast mode with modified socket buffer sizes similar to the provided cycloneDDS profile.

Cyclone DDS

<?xml version="1.0" encoding="UTF-8"?>
<CycloneDDS xmlns="https://cdds.io/config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://cdds.io/config https://raw.githubusercontent.com/eclipse-cyclonedds/cyclonedds/master/etc/cyclonedds.xsd">
    <Domain id="any">
        <Internal>
            <SocketReceiveBufferSize min="250MB" />
        </Internal>
        <General>
            <Interfaces>
                <NetworkInterface name="lo" />
            </Interfaces>
            <AllowMulticast>false</AllowMulticast>
        </General>
        <Discovery>
            <ParticipantIndex>auto</ParticipantIndex>
            <Peers>
                <Peer Address="localhost" />
            </Peers>
            <MaxAutoParticipantIndex>120</MaxAutoParticipantIndex>
        </Discovery>
    </Domain>
</CycloneDDS>

My not working Fast RTPS profile

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <!-- Default publisher profile -->
        <publisher profile_name="default publisher profile" is_default_profile="true">
            <topic>
                <resourceLimitsQos>
                    <max_samples>0</max_samples>
                    <max_instances>0</max_instances>
                    <max_samples_per_instance>0</max_samples_per_instance>
                    <allocated_samples>0</allocated_samples>
                </resourceLimitsQos>
            </topic>
            <qos>
                <reliability>
                    <max_blocking_time>
                        <sec>DURATION_INFINITE_SEC</sec>
                    </max_blocking_time>
                </reliability>
                <publishMode>
                    <kind>SYNCHRONOUS</kind>
                </publishMode>
                <data_sharing>
                    <kind>OFF</kind>
                </data_sharing>
            </qos>
            <historyMemoryPolicy>PREALLOCATED_WITH_REALLOC</historyMemoryPolicy>
        </publisher>

        <!-- Default subscriber profile -->
        <subscriber profile_name="default subscriber profile" is_default_profile="true">
            <qos>
                <data_sharing>
                    <kind>OFF</kind>
                </data_sharing>
            </qos>
            <historyMemoryPolicy>PREALLOCATED_WITH_REALLOC</historyMemoryPolicy>
        </subscriber>

        <transport_descriptors>
            <transport_descriptor>
                <transport_id>UdpTransport</transport_id>
                <type>UDPv4</type>
            </transport_descriptor>
        </transport_descriptors>

        <participant profile_name="udp_transport_profile" is_default_profile="true">
            <rtps>
                <sendSocketBufferSize>1048576</sendSocketBufferSize>
                <listenSocketBufferSize>4194304</listenSocketBufferSize>
                <builtinTransports>LARGE_DATA</builtinTransports>
                <userTransports>
                    <transport_id>UdpTransport</transport_id>
                </userTransports>
                <useBuiltinTransports>true</useBuiltinTransports>
            </rtps>
        </participant>
    </profiles>
</dds>

Details

I have an Ouster OS1-32U 3D lidar and an Intel Realsense and they both combined make up to 35 publishers including the depth clouds from both the sensors and their respective IMU data. The incoming data is as I require viz IMU at 200Hz, OS cloud at 20, and Realsense Cloud at 30Hz. But when I start recording the bag file and play the recorded bag file my messages are not as per the frame rates I mentioned above and they are significantly less (almost 0 for a few seconds).

Previously I was using Cyclone DDS which I think is bottlenecking the data transport hence I want to shift to FastRTPS, but it seems I am unable to reproduce the same profile manner here i.e (Publish in unicast mode and only to the localhost and not any other connected PC, increase the incoming buffer size).

Steps to reproduce issue

Expected behavior

Actual behavior

[ros2-10]   /camera/camera/color/metadata: 329
[ros2-10]   /ouster/range_image: 219
[ros2-10]   /camera/camera/gyro/metadata: 2184
[ros2-10]   /ouster/nearir_image: 219
[ros2-10]   /ouster/reflec_image: 219
[ros2-10]   /ouster/signal_image: 219
[ros2-10]   /camera/camera/gyro/sample: 2184
[ros2-10]   /camera/camera/accel/metadata: 1110
[ros2-10]   /camera/camera/imu: 2184
[ros2-10]   /ouster/scan: 219
[ros2-10]   /camera/camera/color/image_raw: 329
[ros2-10]   /ouster/points: 727
[ros2-10]   /camera/camera/aligned_depth_to_color/image_raw: 327
[ros2-10]   /camera/camera/aligned_depth_to_color/camera_info: 327
[ros2-10]   /camera/camera/depth/image_rect_raw: 328
[ros2-10]   /camera/camera/depth/color/points: 327
[ros2-10]   /ouster/imu: 1935
[ros2-10]   /camera/camera/depth/camera_info: 328
[ros2-10]   /camera/camera/color/camera_info: 329
[ros2-10]   /camera/camera/accel/sample: 1110
[ros2-10]   /camera/camera/depth/metadata: 328
[ros2-10] Total lost: 15481

Thanks.

fujitatomoya commented 3 months ago

@yashmewada9618 out of curiosity.

i.e (Publish in unicast mode and only to the localhost and not any other connected PC, increase the incoming buffer size).

for localhost, ROS 2 provides ROS_LOCALHOST_ONLY environment variable to isolate the communication in the localhost. this is much easier and agnostic from rmw implementation.

besides that, why did you disable data sharing and shared memory transport? i think those are meant to be used for localhost low latency high throughput communication.

fujitatomoya commented 3 months ago

CC: @MiguelCompany @EduPonz

yashmewada9618 commented 3 months ago

@fujitatomoya I tried using the ROS_LOCALHOST_ONLY but it didn't work in my case, a more detailed problem can be found here. Regarding data sharing and shared memory transport, I was looking into the config on how fastrtps performs with my sensor suite, and no specific reason for it.

fujitatomoya commented 3 months ago

I tried using the ROS_LOCALHOST_ONLY but it didn't work in my case, a more detailed problem can be found https://github.com/ouster-lidar/ouster-ros/issues/278.

if it does not work only in localhost, that is the problem...

https://github.com/ouster-lidar/ouster-ros/issues/278 does not say any details either, just said didn't work in our case.

are you saying even with ROS_LOCALHOST_ONLY is enabled, endpoints can communicate each other beyond the network?