net: coap_client: truncation of packets longer than CONFIG_COAP_CLIENT_MESSAGE_SIZE

mrodgers-witekio commented 3 months ago

Describe the bug When making requests using coap_client_req, responses longer than CONFIG_COAP_CLIENT_MESSAGE_SIZE can be truncated, or result in incorrect handling of the subsequent block transfer (depending on the total length of the resource).

This issue can manifest itself in three different ways (that I can see), but all are related:

1) Response is longer than CONFIG_COAP_CLIENT_MESSAGE_SIZE but shorter than MAX_COAP_MSG_LEN (because the CoAP header is shorter than the maximum value).

In this case no truncation happens, the whole payload is delivered to the application callback. This is OK although perhaps slightly counter-intuitive: the application developer might have assumed that the callback can't deliver a message payload longer than CONFIG_COAP_CLIENT_MESSAGE_SIZE and could have sized downstream buffers etc. accordingly.

For example with CONFIG_COAP_CLIENT_MESSAGE_SIZE=256 and a resource length of 270 bytes:

[00:00:00.110,000] <inf> coap_download: Starting CoAP download using IPv6
[00:00:00.180,000] <inf> coap_download: CoAP response, result_code=69, offset=0, len=270
[00:00:00.180,000] <inf> coap_download: CoAP download done, got 270 bytes in 70 ms

2) Response is longer than CONFIG_COAP_CLIENT_MESSAGE_SIZE, longer than MAX_COAP_MSG_LEN, but from the server's point of view still fits in a single block (because the server defaults to a larger block size than the client).

In this case the application receives a single callback with a truncated response, and no indication that the response is truncated. The CoAP client doesn't send any further block requests to retrieve the rest of the data.

For example with CONFIG_COAP_CLIENT_MESSAGE_SIZE=256 and a resource length of 512 bytes:

[00:00:00.110,000] <inf> coap_download: Starting CoAP download using IPv6
[00:00:00.180,000] <inf> coap_download: CoAP response, result_code=69, offset=0, len=279
[00:00:00.180,000] <inf> coap_download: CoAP download done, got 279 bytes in 70 ms

But in wireshark, we can see that the server sent a payload of 512 bytes:

3) Response is longer than CONFIG_COAP_CLIENT_MESSAGE_SIZE, longer than MAX_COAP_MSG_LEN, and longer than the server's default block size.

In this case, the CoAP client receives the first response, parses the Block2 option sent by the server, and sends requests for the remaining blocks that it is missing. But because the first packet was longer than CONFIG_COAP_CLIENT_MESSAGE_SIZE, the offsets passed to the application callback for all subsequent blocks are wrong - they are not a multiple of the block size. So it appears to the application as though more data has been received than the actual size of the resource (since some data has been received twice).

For example with CONFIG_COAP_CLIENT_MESSAGE_SIZE=256 and a resource length of 2048 bytes:

[00:00:00.110,000] <inf> coap_download: Starting CoAP download using IPv6
[00:00:00.180,000] <inf> coap_download: CoAP response, result_code=69, offset=0, len=279
[00:00:00.240,000] <inf> coap_download: CoAP response, result_code=69, offset=279, len=256
[00:00:00.300,000] <inf> coap_download: CoAP response, result_code=69, offset=535, len=256
[00:00:00.360,000] <inf> coap_download: CoAP response, result_code=69, offset=791, len=256
[00:00:00.420,000] <inf> coap_download: CoAP response, result_code=69, offset=1047, len=256
[00:00:00.480,000] <inf> coap_download: CoAP response, result_code=69, offset=1303, len=256
[00:00:00.540,000] <inf> coap_download: CoAP response, result_code=69, offset=1559, len=256
[00:00:00.600,000] <inf> coap_download: CoAP response, result_code=69, offset=1815, len=256
[00:00:00.600,000] <inf> coap_download: CoAP download done, got 2071 bytes in 490 ms

and in Wireshark:

To Reproduce Check out the PR: https://github.com/zephyrproject-rtos/zephyr/pull/76069

Build and run using the instructions in the README.rst, using native_sim is easiest.

Change the size of the resource sent by the server to see each of the different issues observed above.

Expected behavior Any data that can't fit in the client's receive buffer should be retrieved by subsequent block transfer requests. The offset of each block passed to the application should be correct (ie. a multiple of the block size).

Impact The application can't reliably receive CoAP data of any arbitrary length, unless the server's default block size happens to be less than or equal to the client's max message size.

Environment (please complete the following information):

OS: Linux
Toolchain: Zephyr SDK
Commit SHA or Version used: latest main at the time of writing (b8ae3a416b9).

Additional context

I'd be happy to help with a fix for this if that's useful, I can see a few changes that might be needed but could do with some guidance on the best way to go:

1) Use the MSG_TRUNC flag when receiving from the socket, so that we know the true length of the packet and whether it was truncated. Use this info when deciding if a block transfer is needed, not just the Block2 option in the response packet. 2) Ensure that the block offset is only ever incremented by a multiple of the block size 3) (Not necessary but might be helpful): always send a Block2 option in the first request packet, to ensure that the server doesn't respond with more data than the client can handle - ie. Figure 3: Block-Wise GET with Early Negotiation from https://datatracker.ietf.org/doc/html/rfc7959#section-3.1

rlubos commented 3 months ago

Thanks for the detailed report. If you're able to work on this, it'd be great, otherwise just let me know.

Use the MSG_TRUNC flag when receiving from the socket, so that we know the true length of the packet and whether it was truncated. Use this info when deciding if a block transfer is needed, not just the Block2 option in the response packet.

Detecting that the payload is incomplete would definitely be useful, the application should not receive incomplete payload w/o knowing. Although, perhaps we should combine this with the proposal in point 3, use early negotiation for GET requests, and if the reply doesn't fit the buffer, report an error. If the server ignored the suggestion, there's really not much that can be done with statically allocated buffers.

Ensure that the block offset is only ever incremented by a multiple of the block size

Definitely.

(Not necessary but might be helpful): always send a Block2 option in the first request packet, to ensure that the server doesn't respond with more data than the client can handle - ie. Figure 3: Block-Wise GET with Early Negotiation from https://datatracker.ietf.org/doc/html/rfc7959#section-3.1

See 1, IMO it's a good idea to use this mechanism given the buffer size on our side is limited. I hope this is well supported in server implementations.

@SeppoTakalo @pdgendt FYI in case you have an opinion on this.

mrodgers-witekio commented 3 months ago

OK, I should have some time this week to put something together

SeppoTakalo commented 3 months ago

I think the CoAP client should be fixed so that it rejects any incoming data that would be truncated.

On most cases, the server do not add any Etag option for the data, so we don't exactly know that the new request with modified Block2 options would receive the same payload. So safest option is that client do not silently start a new Block-Wise request out of normal GET request.

But instead, we should properly document how to start the GET with Block2 option already in the first request. I think that should be the application developers responsibility to add that option in the query. However, I'm unsure if this is even working correctly now, I just checked that the struct coap_client_request has option fields.

Another fix should be that we inform the callback the we received too much data and rejected the request. Maybe we should give the "4.13 Request Entity Too Large" code to callback, even though we don't send anything to server (or can we respond Ack with RESET?).

SeppoTakalo commented 3 months ago

On a quick look, coap_client.c seem to add all extra options to subsequent block requests. This means that if application starts a Block-Wise GET, on a following requests there will be duplicate Block2 options.

So there is a bug that prevents application requesting Block-Wise.

mrodgers-witekio commented 3 months ago

But instead, we should properly document how to start the GET with Block2 option already in the first request. I think that should be the application developers responsibility to add that option in the query. However, I'm unsure if this is even working correctly now, I just checked that the struct coap_client_request has option fields.

I actually tried this at the application level and it works for the first request. However any subsequent requests then have both:

The original Block2 option added by the application, with NUM=0
An extra Block2 option added by the coap_client, with the correct value of NUM and potentially a different SZX to the first Block2 option (e.g. in case the server responded with a smaller block size than the application requested).

To me it seems cleaner to add the Block2 option to the first request inside the coap_client instead of parse the application-added options for an existing Block2 option and then modify it. But I'd be open to doing it either way.

SeppoTakalo commented 3 months ago

To me it seems cleaner to add the Block2 option to the first request inside the coap_client instead of parse the application-added options for an existing Block2 option and then modify it. But I'd be open to doing it either way.

I was thinking that only the application developer has the knowledge that this exact resource would lead to bigger responses. Therefore I would push the responsibility to application. If coap_client would add it, then it would be an extra option on all of the requests. That might cause issues.

But the coap_client should be fixed so that if user submitted options have same code, than what the coap_client already used, then it should be dropped. So that Block2 would only be added in the first request by the user, but subsequent requests would use the one calculated from server provided values.

boaks commented 3 months ago

If the server ignored the suggestion,

Then it violates a MUST RFC 7959, 2.4. Using the Block2 Option

To influence the block size used in a response, the requester MAY
   also use the Block2 Option on the initial request, giving the desired
   size, a block number of zero and an M bit of zero.  A server MUST use
   the block size indicated or a smaller size.

The good point on having not that many implementations is, that it's also possible to check them ;-).

AFAIK:

C:
- zephyr coap-server
- libcoap (I may check that)
Java:
- Eclipse/Californium (I may check that, hope it's compliant)
- nCoAP (Nordic)
Pyhton
- aiocoap
Go:
- go-coap (I guess we need to ask golioth)

Not sure about C# and Rust.

boaks commented 3 months ago

Anyway, I'm not sure, how many applications, which are using two buffers

    uint8_t send_buf[MAX_COAP_MSG_LEN];
    uint8_t recv_buf[MAX_COAP_MSG_LEN];

will be really that constraint, the MAX_COAP_MSG_LEN is smaller than 1280.

I don't use the coap_client, I use the coap message functions only and handle the sending and receiving on my app. There it's not too hard to use one, but large, receive buffer.

zephyrproject-rtos / zephyr

net: coap_client: truncation of packets longer than CONFIG_COAP_CLIENT_MESSAGE_SIZE #76089