shamblett / coap

A Coap package for dart
Other
16 stars 13 forks source link

Problem with error handling on timeout or cancel #138

Closed Zielscheibe closed 1 year ago

Zielscheibe commented 1 year ago

Hi, I currently implement a flutter app that communicates with IoT devices using coap. Experimenting with this package, I found some unexpected behaviour. However, I have to point out that I'm new to the coap protocol.

The use case

(Tested with the latest release and also checked out your current master to test)

When working with the CoapClient class everything works like described in your examples, as long as my coap-server is reachable. I use a simple server implementation based on this example.

The problem I face is when I shut down the server to simulate a device that is temporarily not available, something which I expect in a typical IoT use case. From My understanding, a confirmable get request like

final test = client.get('/hello').then(print).catchError(print);

should print "CoapRequestTimeoutException: Request timed out after 0 retransmits." after timeout, such that I can handle the case accordingly.

The problem

The given example never exits. If I await the future, it never continues. This I would expect for a request with confirmable = false since this is, in my understanding a simple UDP message that never waits for an answer.

The following example leads at least in a timeout:

  final request = CoapRequest.newGet();
  request.ackTimeout = 1000;
  _build(
    request,
    '/hello',
    CoapMediaType.applicationJson,
    false,
    0,
  );
  request.timedOutHook = () {
    print('timeout');
    client.close();
  };
  unawaited(
    client.send(request).catchError(print),
  );
  client.cancel(request);

results into: timeout CoapRequestTimeoutException: Request timed out after 0 retransmits.

Summary

So I hope I didn't get the intentions of your package wrong. If so, I would like to know how the usage, in my case, is intended. Thanks in advance.

shamblett commented 1 year ago

Interestingly #137 has just been added to fix a problem with the get_max_retransmit example not working, the example was hanging if no responses at all are returned as the server is non existent. This was applied on 24/10, you say you tested against the master branch but not when, if this was before this date could you please re test.

This may be different to your case, this test never receives a response, there may still be an issue if responses are received then the server becomes unreachable, the timeout should still be expected after the re transmits expire.

Zielscheibe commented 1 year ago

Hi, thanks for the fast reply, I tested yesterday with commit [29b9465]

shamblett commented 1 year ago

OK thanks for checking, the package is due to be re published at version 6.0.0 over the next few days, however I will look at this problems and try and get a fix out as soon as.

JKRhb commented 1 year ago

Hi @Zielscheibe, thanks for reporting this issue! I just tested the client, and it seems to me as if the timeout functionality does work in general. However, a request is only considered a timeout after 8 retransmits (by default), so that might be the reason why the client appears to hang indefinitely :/ Could you maybe try again with a config with a lower maximum number of retransmits or reduce the number on the request itself?

In general, I think we should revisit the timeout logic and check if it is consistent with the RFCs.

Zielscheibe commented 1 year ago

Hi, @JKRhb also thanks for checking my problem so fast. After altering the default config (that I used in the above case) to maxRetransmit = 2, I can confirm that the request ended in the timeout. So that's fine, and I will further experiment with my use case. Also looking forward to the next release. :)

However, I'm still a bit puzzled by the behaviour of the default config. So I experimented with different maxRetransit settings that resulted into:

In my understanding, this makes sense from the default settings where ackTimeout = 3000ms and ackTimeoutScale = 2.0. So roughly 6 seconds for every transmit. Is 8 as default a good default then? I ask this, because I also checked my server with a default rust implementation of a coap client. This behaves more as I would expect, timing out after a few seconds per default.

So, in conclusion, I agree the client works as it should. I didn't get the parameters right. But maybe verbalizing this may help future devs like me. :)

JKRhb commented 1 year ago

Thank you for investigating this! Having another look into RFC 7252 (where CoAP is specified), the maxRetransit parameter should probably be changed to 4. The current default for ackTimeout also seems a little bit too high. Using the defaults from the RFC should reduce the default timeout to 247 seconds, which is still relatively long, but definitely better than the current behavior. In any case, you are right that the documentation should be improved in order to make it easier for users to adjust the parameters to values that fit their use case :)

shamblett commented 1 year ago

@Zielscheibe version 6.0.0 of the package now published, please update to this, if you find any problems or areas where you think the package could be improved in any way please raise issues here and we will look at them, thanks.

Zielscheibe commented 1 year ago

Hi, thanks for the fast update! Tested my use case with the new version and the hints you gave me. Everything works as expected. So I close this issue now.