Open christophebedard opened 2 years ago
Note that I don't think this behaviour is tested anywhere in ROS 2.
I added a test to test_rmw_implementation
to cover this: https://github.com/christophebedard/rmw_implementation/commit/76054846fa88a82da711d043322d15219f0f1311. However, the test passed on all RMW implementations in my workspace (except rmw_email_cpp
:laughing:). To properly test and compare interprocess vs intraprocess, I've modified/created simple ping-pong test cases and was able to reproduce the issue:
The publisher's GID is printed after it's created. The publisher GID from the message info data is printed in the subscription callback. These two should match. We run each test-case with rmw_fastrtps_cpp
and rmw_cyclonedds_cpp
.
Steps:
test-tracetools-pingpong-print-publisher-gid
branch from ros2_tracing
: https://gitlab.com/ros-tracing/ros2_tracing/-/commits/test-tracetools-pingpong-print-publisher-gidtest_tracetools
rmw_fastrtps_cpp
: GIDs match
# Terminal 1
$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ./build/test_tracetools/test_ping
original: pub gid: 1 15 138 60 134 102 14 96 1 0 0 0 0 0 19 3 0 0 0 0 0 0 0 0
# Terminal 2
$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ./build/test_tracetools/test_pong
sub msg : pub gid: 1 15 138 60 134 102 14 96 1 0 0 0 0 0 19 3 0 0 0 0 0 0 0 0
rmw_cyclonedds_cpp
: GIDs do not match
# Terminal 1
$ RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ./build/test_tracetools/test_ping
original: pub gid: 255 131 30 82 99 164 26 64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# Terminal 2
$ RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ./build/test_tracetools/test_pong
sub msg : pub gid: 239 202 37 164 203 102 85 109 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rmw_fastrtps_cpp
: GIDs match
# Terminal 1
$ RMW_IMPLEMENTATION=rmw_fastrtps_cpp ./build/test_tracetools/test_pingpong
original: pub gid: 1 15 138 60 54 105 11 201 1 0 0 0 0 0 19 3 0 0 0 0 0 0 0 0
sub msg : pub gid: 1 15 138 60 54 105 11 201 1 0 0 0 0 0 19 3 0 0 0 0 0 0 0 0
rmw_cyclonedds_cpp
: GIDs match
# Terminal 1
$ RMW_IMPLEMENTATION=rmw_cyclonedds_cpp ./build/test_tracetools/test_pingpong
original: pub gid: 146 105 131 154 211 166 10 147 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
sub msg : pub gid: 146 105 131 154 211 166 10 147 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Conclusion: it's fine when pub/sub is within the same process (or node), but the publisher GIDs do not match in normal interprocess communications.
I agree it is fair to call this discrepancy a bug, though prior to Feb 10 2020 you wouldn't have been able to observe this because originally there was nothing in the ROS 2 interface that required a globally unique identifier and everything in the Cyclone DDS RMW layer used the same identifier that you see here.
That identifier is what is called the "instance handle" for the DDS DataWriter as it is visible in the DDS discovery topics by taking a subscription to to the DCPSPublication topic, and what is made available to the data reader in the "publication_handle" of the SampleInfo
(2.2.2.5.5 "SampleInfo Class" in the DDS spec).
The GUID is not in the standardised fields of this SampleInfo
, and it is also not in Cyclone's. Adding it would make the sample info even more obese (IMHO) and assuming I find the time to restructure a few things in the innards of Cyclone, it won't be easily available internally for inclusion in the sample info either. The standards-conforming way to get it is to convert the "publication_handle" into the publisher's GUID and to that efficiently you'll need to maintain a mapping. Either in the Cyclone DDS RMW layer itself, or, more elegantly, in the dds_common
package.
Thanks for the explanation!
That identifier is what is called the "instance handle" for the DDS DataWriter as it is visible in the DDS discovery topics by taking a subscription to to the DCPSPublication topic, and what is made available to the data reader in the "publication_handle" of the SampleInfo (2.2.2.5.5 "SampleInfo Class" in the DDS spec).
The standards-conforming way to get it is to convert the "publication_handle" into the publisher's GUID and to that efficiently you'll need to maintain a mapping. Either in the Cyclone DDS RMW layer itself, or, more elegantly, in the dds_common package.
Looks like most of this exists already: https://github.com/ros2/rmw_cyclonedds/blob/e1a1c62fbb14eda24cd5269ac789217a8d325d3d/rmw_cyclonedds_cpp/src/rmw_node.cpp#L876
Implementation-wise, is the mapping from dds_builtintopic_endpoint_t
's participant_instance_handle
to the GID obtained with convert_guid_to_gid()
using dds_builtintopic_endpoint_t
's participant_key
? Like this here: https://github.com/ros2/rmw_cyclonedds/blob/e1a1c62fbb14eda24cd5269ac789217a8d325d3d/rmw_cyclonedds_cpp/src/rmw_node.cpp#L892
Bug report
Required Info:
ros2.repos
file atmaster
Steps to reproduce issue
Have a publisher that publishes a message, and have a subscription that receives that message.
Normal inter-process setup, no shared memory or anything.
Expected behavior
These two are equal:
rmw
message info struct when the subscription takes a messagermw
publisher (and underlying DDS writer GID/GUID; the conversion is trivial) that sent the message that gets taken by the subscriptionwhen it's the same publisher.
Actual behavior
Publisher GIDs do not match.
Additional information
The GID of a publisher is recorded here in
rmw_create_publisher()
: https://github.com/ros2/rmw_cyclonedds/blob/35bdd1ea0805fc16dccdcf6d6f07e587135e9149/rmw_cyclonedds_cpp/src/rmw_node.cpp#L2457I'm recording the source publisher GID here in
rmw_take()
: https://github.com/ros2/rmw_cyclonedds/blob/35bdd1ea0805fc16dccdcf6d6f07e587135e9149/rmw_cyclonedds_cpp/src/rmw_node.cpp#L3132-L3137 using this addition:Less than 10 lines above that, it's copying
publication_handle
from thedds_sample_info_t
struct into intomessage_info->publisher_gid
. I'm not sure whether that's actually a GID: https://github.com/ros2/rmw_cyclonedds/blob/35bdd1ea0805fc16dccdcf6d6f07e587135e9149/rmw_cyclonedds_cpp/src/rmw_node.cpp#L3116I tried to do
get_entity_gid(info.publication_handle, message_info->publisher_gid);
instead of simply copying the values, but the GIDs don't seem to match either.Note that I don't think this behaviour is tested anywhere in ROS 2. However, publisher GIDs seem to match correctly with
rmw_fastrtps_cpp
.