Messages get dropped when larger than 0.5MB - using shared memory - QoS is Best_effort

dk-teknologisk-lag commented 5 months ago

Bug report

Required Info:

Operating System:
- Ubuntu 22.04 - ros:rolling docker image
Installation type:
- binary
Version or commit hash:
- ros-rolling-fastrtps 2.11.2-1jammy.20231004.145650
DDS implementation:
- Fast-RTPS
Client library (if applicable):
- rclcpp

Steps to reproduce issue

It all stems from transferring pointcloud data from Ousters ROS2 driver to any subscriber - ros2bag/ros2 topic echo/hz etc - these also indicates dropped messages. The minor example here, can reproduce it though. As far as I know all sensors have their publishing QoS set to BEST_EFFORT, so this is also the case in this example.

To test it do:

1. clone https://github.com/dk-teknologisk-lag/ros2_test
2. open the project in VS-code
3. Build the project when the docker has launched
4. Launch the publisher with the scripts, ie **./pub_10MB.bash** or **ros2 run cpp_pubsub talker --ros-args -p freq:=10 -p bytesize:=10000000**
5. Launch the listener with **ros2 run cpp_pubsub listener**

Expected behavior

Messages get sent and received with the required frequency

Actual behavior

Messages get dropped occasionally, getting worse the higher frequency or package size. See image:

Additional information

I have searched everywhere to find a solution, but the majority is suggestion to change buffer sizes, but it doesn't seem to be applicable here, since it uses shared memory. As seen on the image, it seemingly only uses around 1.5MB and has up to 64MB available.

fujitatomoya commented 5 months ago

CC: @Barry-Xu-2018

dk-teknologisk-lag commented 5 months ago

I have opened a discussion here as well as I could reproduce it using their helloworld example with a bit of modifications, fyi - https://github.com/eProsima/Fast-DDS/discussions/4276

Barry-Xu-2018 commented 5 months ago

I can reproduce this issue. But on host (Not container), message_lost_callback is never called. If QoS is set as Reliable, there is no problem. This issue is unrelated to the segment size of shared memory.

In FastDDS shared memory example, it also uses Reliable QoS. I simply modify it (topic_qos) to BEST_EFFORT.
I didn't find this issue (The same Fastdds version 2.11.2). Maybe the size of message is small (only 1M).

dk-teknologisk-lag commented 5 months ago

I have update the example to linked to in the other discussion, but after changing it to BEST_EFFORT and a message size of 10MB I get the same behavior:

dk-teknologisk-lag commented 5 months ago

I just tried to change to RELIABLE and here I also get dropped messages - also if I change back to just sending "Hello world" though a lot fewer:

I noticed earlier that when I was using the RELIABLE QoS it didn't printed the lost messages, but inspecting the message IDs, I could see that there was gaps. In the above image, it jumps from 10195 to 10214, but it could probably be a limit of the console, that prints out of "order". But still, there are jumps in the message ids.

dk-teknologisk-lag commented 5 months ago

How come it transfers 10241024 bytes and not 2 1024 * 1024 which the segment size is set to?

Also, whats the difference of the QoS topic vs DataWriter - does they both require the same?

dk-teknologisk-lag commented 5 months ago

How come it transfers 10241024 bytes and not 2 1024 * 1024 which the segment size is set to?

Also, whats the difference of the QoS topic vs DataWriter - does they both require the same?

I figured it was the buffer size of data, rather than the size of the segment nor string. Got it working with a string of 10MB.

dk-teknologisk-lag commented 5 months ago

So, if I increase the segment size to 10 1024 1024, so it can hold an entire message in shared memory, I can run at ~1000 hz~ about 3-500 hz, even though the sleep time is set to 1ms, but there are seemingly no package loss, sending 10MB messages.

I guess its the entire HelloWorld data struct that get copied, so I should allocate for 11MB + 4 bytes, since it has its data array of chars, consuming 1024*1024 bytes, and its uint32_t m_index field?

Can I set this using a XML file? Force it to not use builtin transport, but a specific shared memory with larger segment size?

Barry-Xu-2018 commented 5 months ago

Can I set this using a XML file? Force it to not use builtin transport, but a specific shared memory with larger segment size?

Do you want to test it on ros2 environment ?
I have not used XML to configure transport on ROS2 before. But I think you can refer to section 6.4.3 in https://fast-dds.docs.eprosima.com/en/latest/fastdds/transport/shared_memory/shared_memory.html and prepare XML which was described in https://github.com/ros2/rmw_fastrtps/blob/rolling/README.md.

BTW, there is an easy way. Modify segment size at

https://github.com/ros2/rmw_fastrtps/blob/4d0be32e6c455edbf708003dffb67b11d512c5a6/rmw_fastrtps_shared_cpp/src/participant.cpp#L195-L197

        auto shm_transport =
          std::make_shared<eprosima::fastdds::rtps::SharedMemTransportDescriptor>();
        shm_transport->segment_size(xxxxxx);  // <== change the size of segment 
        domainParticipantQos.transport().user_transports.push_back(shm_transport);

And only rebuild rmw_fastrtps package.

dk-teknologisk-lag commented 5 months ago

Currently I'm using the binary packages installation, so I would avoid having to deploy a custom build rmw_fastrtps package.

Currently tried with:

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <!-- Create a descriptor for the new transport -->
        <transport_descriptor>
            <transport_id>shm_transport_only</transport_id>
            <type>SHM</type>
            <segment_size>12582912</segment_size>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="DisableBuiltinTransportsParticipant">
        <rtps>
            <!-- Link the Transport Layer to the Participant -->
            <userTransports>
                <transport_id>shm_transport_only</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
        </rtps>
    </participant>
</profiles>

It doesn't complain about, whereas if I tried with segmentSize it did. But it doesn't seem to have an effect (commented out the shared memory setup in the HelloWorldsharedMem example.

fujitatomoya commented 5 months ago

Messages get dropped when larger than 0.5MB - using shared memory - QoS is Best_effort

this is expected. either shared memory or not, setting Best Effort means there is always the possibility to drop the message.

https://github.com/dk-teknologisk-lag/ros2_test/blob/853dab56a842c373faf1f585e231511d6a262cb0/src/cpp_pubsub/src/publisher_member_function.cpp#L46

this is Not bounded data type, which cannot use LoanedMessage nor Data Sharing Delivery.

@dk-teknologisk-lag after all, i suggest that you can try with LoanedMessage, message data type must be bounded. (and underneath, rmw_fastrtps will use Data Sharing Delivery to achieve zero copy data sharing.)

here is the demo code, https://github.com/ros2/demos/blob/rolling/demo_nodes_cpp/src/topics/talker_loaned_message.cpp

EduPonz commented 5 months ago

Currently I'm using the binary packages installation, so I would avoid having to deploy a custom build rmw_fastrtps package.

Currently tried with:

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <!-- Create a descriptor for the new transport -->
        <transport_descriptor>
            <transport_id>shm_transport_only</transport_id>
            <type>SHM</type>
            <segment_size>12582912</segment_size>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="DisableBuiltinTransportsParticipant">
        <rtps>
            <!-- Link the Transport Layer to the Participant -->
            <userTransports>
                <transport_id>shm_transport_only</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
        </rtps>
    </participant>
</profiles>

It doesn't complain about, whereas if I tried with segmentSize it did. But it doesn't seem to have an effect (commented out the shared memory setup in the HelloWorldsharedMem example.

I'm afraid your participant profile is missing the is_default_profile="true" attribute, see for instance here.

dk-teknologisk-lag commented 5 months ago

Messages get dropped when larger than 0.5MB - using shared memory - QoS is Best_effort

this is expected. either shared memory or not, setting Best Effort means there is always the possibility to drop the message.

https://github.com/dk-teknologisk-lag/ros2_test/blob/853dab56a842c373faf1f585e231511d6a262cb0/src/cpp_pubsub/src/publisher_member_function.cpp#L46

this is Not bounded data type, which cannot use LoanedMessage nor Data Sharing Delivery.

see also eProsima/Fast-DDS#4276

@dk-teknologisk-lag after all, i suggest that you can try with LoanedMessage, message data type must be bounded. (and underneath, rmw_fastrtps will use Data Sharing Delivery to achieve zero copy data sharing.)

here is the demo code, https://github.com/ros2/demos/blob/rolling/demo_nodes_cpp/src/topics/talker_loaned_message.cpp

Yeah, I understand that. But since sending from an actual sensor to PC can run with full 20Hz, with a bit more compressed point cloud format, resulting in 16MB/s for ie. an ouster OS1 lidar, it seems horrible if we can't get 20Hz in IPC out of the box. But as I experienced, increasing the segment_size, ie the shared memory buffer seems to alleviate the dropped messages.

Yes, its unbound type and hence limited to the shared memory feature and not loaned messages. That could for sure be interesting to look into, but that would require a change in the Ouster driver itself, which is a bit out of scope for our current project.

If we get into cpu overload or timing issues for lidar odometry or something similiar we might try to use the loaned message api.

dk-teknologisk-lag commented 5 months ago

I'm afraid your participant profile is missing the is_default_profile="true" attribute, see for instance here.

Think I did try that as well, currently debugging to figure out when and how the xml files are parsed. But if the is_default_profile, the... yeah, default profiles get set to those values?

So when this is executed: https://github.com/ros2/rmw_fastrtps/blob/4d0be32e6c455edbf708003dffb67b11d512c5a6/rmw_fastrtps_shared_cpp/src/participant.cpp#L163

I should get the xml default values here?

Tried to get it working with the modified examples (only the HelloWorldSharedMem) from here: https://github.com/eProsima/Fast-DDS/compare/master...dk-teknologisk-lag:Fast-DDS:bestefforthelloworld

But looking more closely, it doesn't seem to use any default QoS, but create its own - or should it work here as well?

But thanks for the suggestion, will try again tomorrow.

Barry-Xu-2018 commented 5 months ago

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <transport_descriptors>
            <!-- Create a descriptor for the new transport -->
            <transport_descriptor>
                <transport_id>shm_transport</transport_id>
                <type>SHM</type>
                <segment_size>10485760</segment_size>                                                                                                                   
            </transport_descriptor>
        </transport_descriptors>
        <participant profile_name="SHMParticipant" is_default_profile="true">
            <rtps>
                <!-- Link the Transport Layer to the Participant -->
                <userTransports>
                    <transport_id>shm_transport</transport_id>
                </userTransports>
            </rtps>
        </participant>
    </profiles>
</dds>

Using this configuration works. It can significantly reduce the packet loss rate. Even with an increased segment size (test 30M), there is still the phenomenon of packet loss.

RMW_FASTRTPS_USE_QOS_FROM_XML=1 FASTRTPS_DEFAULT_PROFILES_FILE=my_config.xml ros2 run cpp_pubsub talker --ros-args -p freq:=10 -p bytesize:=10000000

RMW_FASTRTPS_USE_QOS_FROM_XML=1 FASTRTPS_DEFAULT_PROFILES_FILE=pub_sub_config.xml ros2 run cpp_pubsub listener

dk-teknologisk-lag commented 5 months ago

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <transport_descriptors>
            <!-- Create a descriptor for the new transport -->
            <transport_descriptor>
                <transport_id>shm_transport</transport_id>
                <type>SHM</type>
                <segment_size>10485760</segment_size>                                                                                                                   
            </transport_descriptor>
        </transport_descriptors>
        <participant profile_name="SHMParticipant" is_default_profile="true">
            <rtps>
                <!-- Link the Transport Layer to the Participant -->
                <userTransports>
                    <transport_id>shm_transport</transport_id>
                </userTransports>
            </rtps>
        </participant>
    </profiles>
</dds>

Using this configuration works. It can significantly reduce the packet loss rate. Even with an increased segment size (test 30M), there is still the phenomenon of packet loss.

RMW_FASTRTPS_USE_QOS_FROM_XML=1 FASTRTPS_DEFAULT_PROFILES_FILE=my_config.xml ros2 run cpp_pubsub talker --ros-args -p freq:=10 -p bytesize:=10000000

RMW_FASTRTPS_USE_QOS_FROM_XML=1 FASTRTPS_DEFAULT_PROFILES_FILE=pub_sub_config.xml ros2 run cpp_pubsub listener

It seems to somewhat work, yes. But it seems to add an additional buffer - ie. take a look at the screenshot below:

The red square marks when I launched with xml file you provided. It creates a buffer of 0.5MB and one with 10.5MB.

The blue is launched with xml, but commented the segmet_size, which seems to create a default sized buffer, ie. there are two of 0.5MB

The green is when launched without a xml file, which then just create a single buffer of the 0.5MB.

So it seems it doesn't use the supplied buffer from XML and thats probably why we still see the package loss.

EduPonz commented 5 months ago

Hi @dk-teknologisk-lag,

The second buffer is there because you did not disable to builtin SHM transport, so you're adding a second one. Please try with the following:

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <transport_descriptors>
            <!-- Create a descriptor for the new transport -->
            <transport_descriptor>
                <transport_id>shm_transport</transport_id>
                <type>SHM</type>
                <segment_size>10485760</segment_size>                                                                                                                   
            </transport_descriptor>
        </transport_descriptors>
        <participant profile_name="SHMParticipant" is_default_profile="true">
            <rtps>
                <!-- Link the Transport Layer to the Participant -->
                <userTransports>
                    <transport_id>shm_transport</transport_id>
                </userTransports>
                <useBuiltinTransports>false</useBuiltinTransports>
            </rtps>
        </participant>
    </profiles>
</dds>

dk-teknologisk-lag commented 5 months ago

I don't even seem to be able to disable SHM transport, ie. like this (borrowed from https://github.com/eProsima/Fast-DDS/issues/2287):

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
<profiles>
    <transport_descriptors>
        <transport_descriptor>
            <transport_id>udp_transport</transport_id>
            <type>UDPv4</type>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="/topic">
        <rtps>
            <userTransports>
                <transport_id>udp_transport</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
        </rtps>
    </participant>
</profiles>
</dds>

dk-teknologisk-lag commented 5 months ago

Hi @dk-teknologisk-lag,

The second buffer is there because you did not disable to builtin SHM transport, so you're adding a second one. Please try with the following:

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <transport_descriptors>
            <!-- Create a descriptor for the new transport -->
            <transport_descriptor>
                <transport_id>shm_transport</transport_id>
                <type>SHM</type>
                <segment_size>10485760</segment_size>                                                                                                                   
            </transport_descriptor>
        </transport_descriptors>
        <participant profile_name="SHMParticipant" is_default_profile="true">
            <rtps>
                <!-- Link the Transport Layer to the Participant -->
                <userTransports>
                    <transport_id>shm_transport</transport_id>
                </userTransports>
                <useBuiltinTransports>false</useBuiltinTransports>
            </rtps>
        </participant>
    </profiles>
</dds>

Ahh, thanks. Seems to work like this - wonder why it didn't work with the previous so it just used UDP?

dk-teknologisk-lag commented 5 months ago

Ahh. missed the is_default_profile="true" - seems to work also with UDP using 95MB/s on the loopback interface.

Thanks a lot for the help. Think we can close this, unless default should be something else than a 0.5MB, which seems quite low in a ROS application?

dk-teknologisk-lag commented 5 months ago

So this can't be configure per topic - since it requires the is_default_profile="true" to be added? Is it only the QoS that can be configured per topic? https://fast-dds.docs.eprosima.com/en/latest/fastdds/ros2/ros2_configure.html#example

dk-teknologisk-lag commented 5 months ago

Nevermind, guess I can just omit the xml config file for those nodes, that don't require large amount of shared mem.

dk-teknologisk-lag commented 5 months ago

A question more. Why does both the publisher and subscriber create a shared memory buffer - according to this diagram https://fast-dds.docs.eprosima.com/en/latest/fastdds/transport/shared_memory/shared_memory.html#definition-of-concepts

the shared memory on the subscriber side is not used?

Barry-Xu-2018 commented 5 months ago

A question more. Why does both the publisher and subscriber create a shared memory buffer - according to this diagram https://fast-dds.docs.eprosima.com/en/latest/fastdds/transport/shared_memory/shared_memory.html#definition-of-concepts

the shared memory on the subscriber side is not used?

On subscriber side, I think you don't need to set the segment size.

dk-teknologisk-lag commented 5 months ago

Yeah, I just tried to make another config, with defaults and it works well, but still creates the smaller shared memory, but guess some of that is used for discovery?

On a side node, I can't get ros2 topic list to show the topic, if I run it with the custom xml profile. Even if I launch ros2 topic list with same xml file.

Ros2 topic echo doesn't work either.

And the ros2 topic hz work, but shows only 15 Hz, when I published with 50 - but that might be the transition to python - one core is at least maxed, which seems to be the bottleneck.

dk-teknologisk-lag commented 5 months ago

If I disable all shared memory and run over UDP it works fine - even though the ros2 topic hz still only show about 15 Hz...

Seems to be what I will go for, for now.

EduPonz commented 5 months ago

Yeah, I just tried to make another config, with defaults and it works well, but still creates the smaller shared memory, but guess some of that is used for discovery?

On a side node, I can't get ros2 topic list to show the topic, if I run it with the custom xml profile. Even if I launch ros2 topic list with same xml file.

Ros2 topic echo doesn't work either.

And the ros2 topic hz work, but shows only 15 Hz, when I published with 50 - but that might be the transition to python - one core is at least maxed, which seems to be the bottleneck.

This is because you'd need to run ros2 daemon stop before calling to ros2 topic list again. You probably had a daemon running with default transports, which means discovery over UDP only. In any case, another thing you can do is to set a transport descriptor for a UDP transport into you XML and let participants have both. That way you'd have the same as you would by default, but with larger segments in the SHM transport.

Regarding the reader side segment:

ROS 2 nodes always have writers (actually 10 of them out of the box) for different things such as parameter service, ros_discovery_info, etc.
If you only have a SHM transport, the discovery traffic uses it, so there are discovery writers as well
The reliability meta-traffic goes from reader to writer

dk-teknologisk-lag commented 5 months ago

Ah, yeah okay. Works fine now I restarted the docker, but ros2 daemon stop would probably be fine too! Thanks for the info.

fujitatomoya commented 5 months ago

Think we can close this, unless default should be something else than a 0.5MB, which seems quite low in a ROS application?

@EduPonz do you think this is something we should adjust on rmw_fastrtps, as far as i know we do not have these kind of setting in rmw_fastrtps, right? so this can be moved to https://github.com/eProsima/Fast-DDS? i am not sure if we want set or change the default for ROS 2, small or big is really application dependency.

if we are not changing any default, i think we can close this issue.

dk-teknologisk-lag commented 5 months ago

Think we can close this, unless default should be something else than a 0.5MB, which seems quite low in a ROS application?

@EduPonz do you think this is something we should adjust on rmw_fastrtps, as far as i know we do not have these kind of setting in rmw_fastrtps, right? so this can be moved to https://github.com/eProsima/Fast-DDS? i am not sure if we want set or change the default for ROS 2, small or big is really application dependency.

if we are not changing any default, i think we can close this issue.

From my viewpoint, things should work out of the box. Generally, you should be able to send small messages even if you have allocated a "large" shared memory pool, but the other way around leads to packet drop, hence this issue.

The question of how large default should be, is of course a bit difficult to guess, but if you cover most cases, one could look towards large point clouds or 8K resolution images from cameras and set that as a target point - that should probably cover most cases.

The only downside is though, that you can run out of shared memory. In default docker its only 64MB, but you get a nice error message that it could not allocate space, if you run short of it.

In comparison, we have a NUC PC which has about 7GB shared mem and my laptop has 32GB. So a default of 10MB or 20, would only be small subset of those. Double the size, if its not easy to set a lower value for subscribers (see below).

If the package you try to send are larger than the shared memory available, you get no warning / error - just lower Hz / dropped messages.

A question more: Is it possible to configure one SHM setting for publishers and a second for subscribers? I find it quite unfortunate that I have to prefix all ros commands, with RMW_FASTRTPS_USE_QOS_FROM_XML=1 FASTRTPS_DEFAULT_PROFILES_FILE=my_config.xml Think that the QOS_FROM_XML can be omitted in my case, but still. It would be nice to be able to set it for the entire system, instead of X number of sensor nodes.

Alternative to increasing the default value, could be to parameterize it, so when you create a publisher you can specify the amount of shared memory, which the driver maintainer then can estimate based on the optimal for each of their drivers/sensors?

Pleune commented 3 weeks ago

I would like to add that I have run into this exact issue trying to view (I think) small images in rqt, where anything over 420x420 resolution plays extremely poorly. This happens when rqt can no longer get the entire message through shm, and I guess struggles with the udp method. I absolutely believe that the ros defaults should be changed to have a shm pool large enough for rqt to work on an average webcam.

Mario-DL commented 3 weeks ago

Just to add some more insight in the configuration options. For large data transmissions we have max_msg_size and sockets_size to adjust, among other things, the size of the shm segment sizes.

fujitatomoya commented 3 weeks ago

I think that it would be probably better to have rmw_fastrtps configuration and setting in https://docs.ros.org/en/rolling/ about these kind of special settings. we already have some information in rmw_fastrtps repo, e.g https://github.com/ros2/rmw_fastrtps?tab=readme-ov-file#large-data-transfer-over-lossy-network, but that is not where users would check.

ros2 / rmw_fastrtps