@alexmv reported the app on his device was sending a rapid series of getMessages requests to the server — around 6 such requests a second.
Because the server rate-limits requests (by default to 200 per minute for each user, total across all types of requests), the user-facing effect was that the requests all failed, with the symptom that scrolling up further in the message list didn't work. The requests also put load on the server, though mitigated by the rate-limiting.
We should fix this.
Diagnosis
From looking at the code, what's happening is:
There's no overt retry logic for that fetch.
But: the thing that triggers that fetch is a listener for the scroll state. It can get called frequently by Flutter's scrolling logic; it's supposed to be lightweight to call repeatedly.
So naturally if we already have one of these fetches in flight, that listener promptly returns. Otherwise if you're near the top of the data we have, and we haven't heard from the server that that's the beginning of history, we make a new /api/get-messages request to fetch more.
And when the fetches are already failing, that amounts to a retry with no backoff.
Implementation
To fix the issue, we'll add backoff to MessageListView.fetchOlder, which is the method that invokes the API request.
When fetchOlder gets an error from the server, it should start a backoff timer (using our BackoffMachine so the intervals escalate) and ignore any further calls until the backoff expires, just like it ignores them when fetchingOlder is true.
We won't add any new retry logic. The fetchOlder call site, in _MessageListState._handleScrollMetrics, already provides plenty of retrying — all the user has to do is touch the screen to scroll slightly up or down, and fetchOlder will get called again. As Alex saw, it can also get called even with no interaction when nothing else seems to be happening.
Out of scope
A further bonus refinement would be for us to understand 429 responses, and wait for the time that that response says:
946
But to really do that right probably belongs at a different layer, in ApiConnection, so that the information is shared across all the different requests we make on behalf of a given account.
And conversely I think if we fix the rapid retries, that will largely solve the problem. I suspect the reason we ended up with 429s here in the first place was likely because of too-rapid retries after some other source of error. Even if not, exponential backoff should let us get out of the situation.
@alexmv reported the app on his device was sending a rapid series of getMessages requests to the server — around 6 such requests a second.
Because the server rate-limits requests (by default to 200 per minute for each user, total across all types of requests), the user-facing effect was that the requests all failed, with the symptom that scrolling up further in the message list didn't work. The requests also put load on the server, though mitigated by the rate-limiting.
We should fix this.
Diagnosis
From looking at the code, what's happening is:
But: the thing that triggers that fetch is a listener for the scroll state. It can get called frequently by Flutter's scrolling logic; it's supposed to be lightweight to call repeatedly.
So naturally if we already have one of these fetches in flight, that listener promptly returns. Otherwise if you're near the top of the data we have, and we haven't heard from the server that that's the beginning of history, we make a new /api/get-messages request to fetch more.
And when the fetches are already failing, that amounts to a retry with no backoff.
Implementation
To fix the issue, we'll add backoff to
MessageListView.fetchOlder
, which is the method that invokes the API request.When
fetchOlder
gets an error from the server, it should start a backoff timer (using ourBackoffMachine
so the intervals escalate) and ignore any further calls until the backoff expires, just like it ignores them whenfetchingOlder
is true.We won't add any new retry logic. The
fetchOlder
call site, in_MessageListState._handleScrollMetrics
, already provides plenty of retrying — all the user has to do is touch the screen to scroll slightly up or down, andfetchOlder
will get called again. As Alex saw, it can also get called even with no interaction when nothing else seems to be happening.Out of scope
A further bonus refinement would be for us to understand 429 responses, and wait for the time that that response says:
946
But to really do that right probably belongs at a different layer, in
ApiConnection
, so that the information is shared across all the different requests we make on behalf of a given account.And conversely I think if we fix the rapid retries, that will largely solve the problem. I suspect the reason we ended up with 429s here in the first place was likely because of too-rapid retries after some other source of error. Even if not, exponential backoff should let us get out of the situation.