zulip / zulip-flutter

Upcoming Zulip mobile apps for Android and iOS, using Flutter
Apache License 2.0
199 stars 193 forks source link

Consider overall timeout for long-poll requests #514

Open gnprice opened 9 months ago

gnprice commented 9 months ago

In #184 I wrote:

We should also ensure that if the request doesn't come back within an appropriate timeout, then we treat it as having failed and retry. The Zulip server should always respond to a long-poll within at most about 60 seconds, with a heartbeat event if there's no information to convey. So if it's been much longer than that and we haven't gotten a response, then we should assume none is coming — the request or response must have been lost somewhere. (This part might get split out as a separate issue.)

This is something we don't do in zulip-mobile, so leaving it for post-launch before we spend time pinning down the right behavior here.

One piece of context is that the underlying HTTP client library (the HttpClient of dart:io, which underlies package:http) already has a timeout for idle connections (with a default of 15 seconds), and presumes the OS has a timeout for making new connections. So as long as that OS-level timeout has an appropriate value, adding an overall timeout may only really change things if we've successfully made a connection and are more or less continuously getting data on it, just very slowly… and in that case the reasoning above for a timeout doesn't really apply and it seems likely best to just have patience.

chrisbobbe commented 1 month ago

Greg and I have both seen a symptom where the app stops live-updating because the Future for the polling request stays unsettled, neither resolving nor rejecting for a long time (hours). There may be a bug in HttpClient or some other layer beneath our app code, but we don't see a clear match in https://github.com/dart-lang/sdk . In my case, this happened on an iOS simulator that went to sleep in some way; I left the simulator on while I was away from my computer for some hours.

The timeout described in this issue would be an effective workaround.

gnprice commented 1 month ago

Bumping this up to the "Launch" milestone, given that additional reason it would be helpful.

The original use case in the issue description is fairly hypothetical; I'm not aware of anyone encountering while interacting with a Zulip server in practice the sorts of conditions that would make a timeout like this useful. But given the investigation Chris and I made of the symptoms he saw today, I think this timeout would effectively work around that apparent issue in the Dart stdlib's HTTP stack. And I suspect the same story explains #809, too — which, coincidence or not, I ran into again just this afternoon (though I think I hadn't since the original case, in July).

Some further details of that investigation here: