Closed gavanderhoorn closed 11 months ago
Friendly ping?
Friendly ping.
@pablogs9 @Acuadros95 would you perhaps have any suggestions on a possible direction for debugging?
Hello @gavanderhoorn, we should fit a time slot to take a look at that, but currently, we are quite busy. I'm going to add this to our internal roadmap we will take a look at some point.
Probably this error is an issue at Micro XRCE-DDS Client level not serializing/copying some header/payload properly when BE is configured, let us know if you find something. If not, we will check it later.
BTW if this issue is critical for your use case, feel free to ask for commercial support at info@eProsima.com or just ask me and we can schedule a meeting with our sales team.
Ok, I'll take that into consideration.
But you can confirm that setting UCDR_MACHINE_ENDIANNESS
should be sufficient in this case?
Yes, it should be enought
@gavanderhoorn We found some issues on the Agent deserialization process with big-endian clients.
Please try out this PR with the fixes https://github.com/eProsima/Micro-XRCE-DDS-Agent/pull/336, will merge when your feedback is OK.
@Acuadros95: absolutely fantastic. I had not yet found time to look into this myself -- and I also didn't expect you guys to be so quick to work on this.
Purely for effort already :100: :+1:
I'll test these changes and update you here -- or would you prefer I comment on the PR?
No problem!
I'll test these changes and update you here -- or would you prefer I comment on the PR?
Comment on the PR and we will close the issue when merged.
Closing as https://github.com/eProsima/Micro-XRCE-DDS-Agent/pull/336 is merged. Reopen if other issues are found
@Acuadros95 @pablogs9: with more time to test, I'm getting to the point of actually sending data between the Client and the Agent and I believe I'm running into another endianness issue.
For context: same systems as described earlier in this issue (ie: BE Client, LE Agent, all Fast-DDS on the ROS 2 side), Humble on both sides. To rule out issues with our use of the micro-ROS API, I've ported the addtwoints_server
and int32_publisher
demos from micro-ROS-demos to run on the client hw. Main difference: we don't use Agent auto-detection, so some code from the configured_publisher
was copied.
Registration of Client to Agent works immediately.
The int32_publisher
starts publishing data, and Wireshark shows the data is as I would expect it. Example:
(this is when the publisher publishes the value 11 -- hence the Not set
for Endianness bit
, which is correct IIUC the DDS-XRCE standard)
On the ROS 2 side, ros2 topic echo /std_msgs_msg_Int32
however shows:
Initially I thought this was garbage data (ie: unitialised memory, misaligned reads, etc), but actually:
Unfortunately I didn't capture the DDS side of the traffic (ie: between the Agent and the ros2 topic echo
node).
Technically I believe this should be OK (the values do represent little-endian integers in some way). Not sure what's making ros2 topic echo
print those 'strange' values.
Is this still a problem in our/the application code, or should payload data have been swapped automatically?
I can provide a Wireshark capture, but it would basically show what I explained above. The application code is int32_publisher
from the micro-ROS-demos
, but I could make it available if it'd help.
Edit: I've searched for mentions of endian
, ENDIAN
, swap
, etc in rcl
, rclc
, rosidl_typesupport_microxrcedds
, rcutils
, rmw_microxrcedds
and similar packages, but don't seem to find any. My assumption so far is that configuring the Client and Micro-CDR should suffice (as discussed earlier in this issue).
Edit 2: could it be the endianess information is lost when the Agent forwards the DDS-XRCE message? If ROS 2 'sees' a LE flag while the payload is BE, that would result in what ros2 topic echo
shows me.
Just captured the ROS 2 (Fast-)RTPS traffic for the same test (repeated it).
I can't be sure, but I believe something is not entirely correct wrt the endianness encapsulation in the messages sent by the Agent.
From the capture, an RTPS packet containing a DDS-XRCE payload forwarded by the Agent:
submessageId: DATA (0x15)
Flags: 0x05, Data present, Endianness bit
[...]
[...]
[Topic Information (from Discovery)]
[topic: rt/std_msgs_msg_Int32]
[typeName: std_msgs::msg::dds_::Int32_]
[DCPSPublicationData In: 185]
writerSeqNumber: 19
serializedData
encapsulation kind: CDR_LE (0x0001)
encapsulation options: 0x0000
serializedData: 00000012
Notice how encapculation kind
is set to CDR_LE
, while serializedData
seems to contain a big-endian 0x12
. Interpreting that as little-endian would result in 301989888
, not 18
.
Edit: looks like services have a similar problem. Payloads coming from an LE client are not swapped. The addtwoints_server
will still add a
and b
, but it won't see swapped BE ints.
I've spent some time trying to find where the encapsulation of specific (sub)messages is set, but I must admit that without some (architecture) documentation, there are a few too many 'moving parts' to quickly & easily determine what's going on exactly.
Hello @gavanderhoorn, I'm back from holidays and I have this as TODO for today, let me check and I will report here probably today.
Hello @gavanderhoorn I have drafted two approaches to solve this issue.
The main point is that the Micro XRCE-DDS "Generic" Typesupport always writes the Representation Header (inside the RTPS Data Payload) as Little Endian:
This line makes your Wireshark packet to decode Representation Header as:
encapsulation kind: CDR_LE (0x0001)
encapsulation options: 0x0000
Note that this "Generic" Typesupport is a Typesupport that serializes a byte array in order to allow the Micro XRCE-DDS Agent type agnostic. Using it, the Agent just forwards the Client's messages to the DDS Dataspace, with no ser/des procedure.
So we have two options here, they are independent.
Branch:
This branch modifies the micro-ROS RMW to prepare the writing buffers and then ensure that the buffer endianness is LE. By default, the buffer preparation method returns a buffer that has the machine's endianness.
Note that after this endianness enforcement, the cdr_serialize
is called, this call will use the Micro-CDR API. And the default calls to the Micro-CDR API will use the endianness configured in the ucdrBuffer
.
Note that:
Branch:
As you know, the Micro XRCE-DDS Client library can be configured to use UCLIENT_BIG_ENDIANNESS
. In general, this flag will make the XRCE sub-messages have the endianness flag enabled, indicating the XRCE communication is BE.
This does not mean that the payload (that is completely up to the application user) is not in BE nor in LE. But as I have mentioned in the beginning, the Micro XRCE-DDS Agent assumes LE.
This approach, adds a configuration parameter in the XRCE Session Creation procedure named uxr_be_pl
. When a Micro XRCE-DDS Client starts a session with this parameter, it indicates that the client intends to serialize its payloads in BE.
In this draft approach, I assume that if the library is configured to use BE wire protocol (by means of UCLIENT_BIG_ENDIANNESS
) it will also serialize the data in BE. This assumption may not be true.
So, when the Client sends the session's property uxr_be_pl
, the Agent, will configure this client session as BE and will configure the Representation Header correctly.
Note that:
In general, those are the approaches I would take, what is your opinion? The first option can be merged directly, the second one will have more test/CI work.
Let me know your thoughts and your test results, so we can choose an option and go ahead.
Thanks @pablogs9. You've confirmed my analysis, and also the two options I saw.
If I understand you correctly, approach two would basically come down to:
encapsulation kind
to CDR_BE
action 4 would inform receiving entities data is in BE format, so they would use an appropriate deserialisation strategy to get data into the endianness they'd prefer/require (action 5).
Correct?
If yes, then my initial reaction would be that I'd find this approach 'cleaner'.
It would also seem to put most of the work (ie: swapping) on the consumer's side, which in the case of Micro-XRCE DDS seems to make sense. Consumers are most likely more powerful systems for which swapping operations might be less costly than for producers.
(and minor, but traditionally all network traffic was always BE)
In this draft approach, I assume that if the library is configured to use BE wire protocol (by means of
UCLIENT_BIG_ENDIANNESS
) it will also serialize the data in BE. This assumption may not be true.
Are there any platforms you support today which would require UCLIENT_BIG_ENDIANNESS to be ON
, but then serialise data with a different endianness?
IIRC, there are systems that are BE for binaries (ie: opcodes are encoded BE) and LE for data, but again IIRC those are rather rare.
Just so IIUC: with wire protocol, you mean the DDS XRCE binary representation et al. defined by the DDS XRCE standard, correct? Not a custom one?
Btw: the Client I can easily incorporate in the application. So approach 1 is easily tested.
Approach 2 also requires changes to the Agent. Would you have a workflow for building that from a different branch quickly/easily?
Thanks @pablogs9. You've confirmed my analysis, and also the two options I saw.
If I understand you correctly, approach two would basically come down to:
- XRCE Client sets BE session property
- XRCE Client sends BE data
- XRCE Agent receives BE data
- XRCE Agent sets
encapsulation kind
toCDR_BE
- ROS/regular DDS subscriber receives BE data, deserialises to desired endianness
action 4 would inform receiving entities data is in BE format, so they would use an appropriate deserialisation strategy to get data into the endianness they'd prefer/require (action 5).
Correct?
That's it.
If yes, then my initial reaction would be that I'd find this approach 'cleaner'.
It would also seem to put most of the work (ie: swapping) on the consumer's side, which in the case of Micro-XRCE DDS seems to make sense. Consumers are most likely more powerful systems for which swapping operations might be less costly than for producers.
Agree with this.
(and minor, but traditionally all network traffic was always BE)
In this draft approach, I assume that if the library is configured to use BE wire protocol (by means of
UCLIENT_BIG_ENDIANNESS
) it will also serialize the data in BE. This assumption may not be true.Are there any platforms you support today which would require UCLIENT_BIG_ENDIANNESS to be
ON
, but then serialise data with a different endianness?
I have never seen this case, but also it is true that AFAIK you are the only ones that use XRCE/micro-ROS in BE.
Just so IIUC: with wire protocol, you mean the DDS XRCE binary representation et al. defined by the DDS XRCE standard, correct? Not a custom one?
Yes, when UCLIENT_BIG_ENDIANNESS
is set means that the DDS XRCE Wire Protocol (implemented in Micro XRCE-DDS). This makes no assumption on the transmitted payload as far as our implementation is type agnostic.
Btw: the Client I can easily incorporate in the application. So approach 1 is easily tested.
Let me know if it works.
Approach 2 also requires changes to the Agent. Would you have a workflow for building that from a different branch quickly/easily?
I would just checkout the branch and follow the classic CMake procedure of mkdir build; cd build; cmake ..; make
. It shall be enough for testing this fix, you won't have micro-ROS Agent features (ROS graph handling, etc) but it should be enought for testing if your remote data consumers works well with the BE issue fix.
As far as you provide positive feedback on the solution, I will proceed with creating and merging a PR.
Btw: the Client I can easily incorporate in the application. So approach 1 is easily tested.
Let me know if it works.
Just performed a quick test (with approach 1): I've cherry-pick
ed https://github.com/micro-ROS/rmw_microxrcedds/commit/6c6962e2e8f77fc985c19bd50e47f134d57190a6 onto the humble
branch (we're using Humble for now), rebuilt our libmicroros
and linked that against int32_publisher
and addtwoints_server
.
As expected, all payloads coming from the client are now marked CDR_LE
.
The int32_publisher
worked: publishes LE integers now and ros2 topic echo
also prints them correctly.
addtwoints_server
didn't want to cooperate though: invoking the service causes the client to hang, the server just printing Service request value: 0 + 0.
, irrespective of what values I actually ask it to add.
Wireshark shows two LE uint64_t
s being serialised by the Agent and sent to the client.
Client then stays 'silent' and does not send back any traffic any more.
I'll see if I can make the capture available tomorrow.
Regarding the services you mention, we have the very same problem here: the transparent forwarding made by the Agent.
As you can see here, the Agent deserializes the received payload (DDS Dataspace -> XRCE path) skipping the representation header. This way, when the raw serialized data (the byte array) is forwarded to the XRCE Client, there is no endianness information.
This drives me to another solution
Micro XRCE-DDS Agent shall not handle the representation header here and here and shall perfect forward what XRCE Client or remote DDS Writers sends.
This way the XRCE Client uses who is in charge of writing or reading the Representation Header and handling the payload correctly. This implies the following changes:
Considerations:
I'm sure I haven't had sufficient :coffee: but would letting the Agent set the encapsulation based on the value set by the Client on the XRCE DDS traffic not be an option?
The current implementation may skip the bytes, but just checking whether the Client has set endianness bit
to 0
and then settings the encapsulation kind
to CDR_BE
-- and vice versa -- doesn't seem like it would be as invasive as what you describe.
That would keep the Agent transparent I'd say, as it doesn't touch the payload, it just peeks at meta-data.
Of course this would mean that BE Clients might receive LE data, which would mean they'd have to swap, but there's no way around that. Similar for BE Clients sending BE data to LE Agents -- but we've discussed that earlier.
I'm sure I'm overlooking something important though.
Edit: and if all else fails, standardising everything on LE (or BE) and incurring the swap somewhere would also seem to solve this.
Perhaps support this as a CMake option, next to UCLIENT_BIG_ENDIANNESS
. Then the choice is up to your clients (as in: customers, users), who could probably weigh the costs of swaps on their platform against performance loss better than we can (as in: an ad-hoc group of developers that's trying to make a best-effort guess for the unknown future).
Issue template
micro_ros_setup
(client), Docker images (Agent)2.0.0
(client), Humble (Agent)Steps to reproduce the issue
This isn't going to be an sscce, but at a high-level:
-DCONFIG_BIG_ENDIANNESS=ON
-DUCLIENT_BIG_ENDIANNESS=ON
(although super build is not used)amd64
)Expected behavior
Successful Client<->Agent connection.
Actual behavior
Agent prints:
and so on.
Client fails to connect (
rclc_support_init_with_options (..)
returns1
).Additional information
Comparing the second packet exchanged with the Agent with a working little-endian Client (different system), it looks like some of the fields are swapped (from a Wireshark capture,
be: big endian
,le: little endian
):Note how the 6th, 7th, 10th and 13th groups are mirror images.
The first packet also seems to have the last 4 bytes swapped (
0x00000002
vs0x02000000
).Perhaps I misunderstood, but I expected
CONFIG_BIG_ENDIANNESS=ON
at the Micro-CDR level to take care of byte-swapping.Is more configuration needed? Should the client be setup differently perhaps?
Edit: I've checked and Micro-CDR appears to be correctly configured/built by CMake. Relevant part of the generated
config.h
: