Messages (and svcs and action goals) larger than a certain size seem to 'disappear'

gavanderhoorn commented 2 years ago

Describe the bug

Agents do not appear to forward messages coming in from the "ROS 2 side" that are larger than a certain size.

Increasing the verboseness of the Agent, DDS messages with len: 67776 for instance just disappear. There are no warnings or errors printed by the Agent, and the Client doesn't appear to receive anything, nor does it notify the Agent of any error.

For reference, this was also discussed in https://github.com/micro-ROS/rmw_microxrcedds/issues/258.

To Reproduce Steps to reproduce the behaviour:

Not exactly steps, but at a high level:

implement a Client capable of subscribing to a message with a variable sized payload
create a rclcpp or rclpy publisher which publishes instances of this message
gradually increase size of the payload from something small to 'something large'
notice how at a certain point, the Client no longer receives any messages

Alternatively: create a server service and client, or an action server and client, which exchange the described message (as one of the fields fi).

Service invocations and action goal submissions will silently start to fail for messages "too large".

Expected behaviour Ideally: messages of all sizes (even large ones) get forwarded successfully.

Realistically (as unlimited memory doesn't exist): the Agent prints a warning or error, clearly showing it's impossible to forward a message.

If possible: in case of svcs and actions: notify the client the invocation failed -- due to RMW-level failures (so not the invocation itself, but the request didn't even arrive at the server).

System information (please complete the following information):

OS: observed under Ubuntu 20.04, but same happens on other OS
ROS 2 Foxy, Galactic and Rolling
Version: using the "most recent" Docker image (microros/micro-ros-agent / 5e93b1fa5608)

gavanderhoorn commented 2 years ago

And just to give it the visibility it deserves, @pablogs9's https://github.com/micro-ROS/rmw_microxrcedds/issues/258#issuecomment-1132792875 on https://github.com/micro-ROS/rmw_microxrcedds/issues/258:

Some details about DDS-XRCE.

DATA sub-messages are exchanged between the client and the agent in order to transmit DDS payloads from the client to the agent (in publishers, requesters, and repliers) or from the agent to the client (when dealing with subscriptions, requesters, and repliers).

This DDS-XRCE DATA sub-message has a lenght field in its subheader of 16 bits, that means that the maximum payload will be about 65 kB (with an MTU that allows that).

When dealing with fragmentation, the same happens because the whole DDS-XRCE DATA sub-message (subheader + payload) will be chopped into smaller fragments. That means that the maximum payload is still about 65 kB even if the fragmentation is used.

What happens when a micro-ROS client tries to publish a big message? It depends:

If size < MTU it fits in an XRCE message and will be sent as it

If size > MTU && size < (MTU*UXRCE_OUTPUT_STREAM_HISTORY), the message will be chopped into fragments and each fragment will be allocated in a different XRCE message and the whole buffer (all XRCE messages) will be flushed to the agent and the agent will reconstruct the original DATA submessage from the FRAG submessages.

If size > MTU && size > (MTU*UXRCE_OUTPUT_STREAM_HISTORY) && size < 2^16 B the message will be chopped into fragments but the fragments do not fit in the buffer so, fragments will be prepared, the serializer will start serializing data and every time the XRCE output buffer is full the FRAG messages will be flushed to the agent. The agent will know when it should stop saving fragments and should regenerate the DATA message from FRAGs because the final frag has a bit enabled in its header. This is known as continuous serialization mode, and you can see the implementation here and here. Agent will know that the regenerated DATA submessage will be ok if all the FRAG payloads matches the DATA length in the DATA subheader.

If size > MTU && size > (MTU*UXRCE_OUTPUT_STREAM_HISTORY) && size > 2^16 B Same as above but the DATA length is set to 0, so the agent needs to rely on receiving all fragments correctly (and this should be like this because XRCE reliability) and will assume that the regenerated DATA submessage is correct. The final FRAG will have the last_fragment_bit enabled. This is the UCLIENT_TWEAK_XRCE_WRITE_LIMIT, and it is off-standart.

What happens when a micro-ROS Agent tries to send a DATA payload to micro-ROS client:

If size < MTU it fits in an XRCE message and will be sent as it.

If size > MTU && size < (MTU*UXRCE_INPUT_STREAM_HISTORY), the message will be chopped into fragments and each fragment will be allocated in a different XRCE message and the whole buffer (all XRCE messages) will be flushed to the client and the client will reconstruct the original DATA submessage from the FRAG submessages.

If size > MTU && size > (MTU*UXRCE_INPUT_STREAM_HISTORY), same as above but the client will fill its buffers with the first UXRCE_INPUT_STREAM_HISTORY FRAGs and the FRAG number UXRCE_INPUT_STREAM_HISTORY+1 will be never acknowledged because it does not fit in the client's memory. So the agent will be trying to retrieve the client's acknowledge of the FRAG UXRCE_INPUT_STREAM_HISTORY+1 but it will never be received.

Headers are not taken into account in the above formulas, so they are not literal. Just a reference.

As we can see there is a couple of problems here:

The client does not have dynamic memory so it won't be able to accept DATA messages split in more FRAG than UXRCE_INPUT_STREAM_HISTORY

If the agent tries to send more FRAGs than UXRCE_INPUT_STREAM_HISTORY the client buffer will be blocked. @Acuadros95 we should take a look on that.

Answering your question @gavanderhoorn, the agent won't send a DATA payload (fragmented or not) to the client if it greater than 2^16 B because of standard limitation.

gavanderhoorn commented 2 years ago

Friendly ping.

Would you already have some idea on how to address:

2. If the agent tries to send more FRAGs than UXRCE_INPUT_STREAM_HISTORY the client buffer will be blocked. @Acuadros95 we should take a look on that.

perhaps?

It's more than logical a resource constraint device is resource constrained, but without a way to detect messages which are too large don't get transferred, making robust applications is rather difficult.

pablogs9 commented 2 years ago

Sorry @gavanderhoorn we do not have enough bandwidth for looking at this. I'll keep it open and check it later.

gavanderhoorn commented 2 years ago

As a data point (perhaps as input for prioritising):

a Micro-ROS application doesn't have (complete) control over the sizes of messages potential ROS 2 publishers/clients send it. IIUC, right now, messages which are too large will disappear, without the sending entity being notified of this.

The best we can do is to ask users to "not send messages which are 'too large'".

This makes it a rather brittle setup, as users are bound to send messages which are "too large" sooner or later (even if by mistake).

JointTrajectorys might be on the edge of what is reasonable for "extremely resource constrained devices", but technically, any message larger than what is described in your https://github.com/micro-ROS/micro-ROS-Agent/issues/143#issuecomment-1135647918 would trigger this behaviour.

gavanderhoorn commented 2 years ago

@pablogs9: just remembered this (UCLIENT_TWEAK_XRCE_WRITE_LIMIT).

Does that tweak allow a way around the standard limitation you mention earlier, or am I misinterpreting the comment there?

pablogs9 commented 2 years ago

This tweak is for sending from the client to the agent payloads bigger than 64 kB, not for receiving them.

gavanderhoorn commented 2 years ago

Ah, ok.

Could the same tweak be used in the other direction?

pablogs9 commented 2 years ago

Not really, because it relies on dynamic memory on the agent side to store an arbitrary amount of fragments. We cannot do such a thing on the client-side...

gavanderhoorn commented 2 years ago

Could making clients responsible for allocating memory and then registering that buffer with the RMW be an option? Would remove the responsibility from the XRCE-DDS Client library.

This might only delay the problem described in the OP and your https://github.com/micro-ROS/micro-ROS-Agent/issues/143#issuecomment-1135647918, but could allow for messages larger than the 16-bit limit if I understand you correctly?

pablogs9 commented 2 years ago

That should be possible but implies rearchitecting Micro XRCE-DDS Client

micro-ROS / micro-ROS-Agent

Messages (and svcs and action goals) larger than a certain size seem to 'disappear' #143