mozilla-services / autopush-rs

Push Server in Rust
Mozilla Public License 2.0
197 stars 15 forks source link

Router: Ensure FCM 502 Errors are being properly handled. #444

Open data-sync-user opened 1 year ago

data-sync-user commented 1 year ago

FCM can return a 502 error which we are currently logging as a Sentry error. This may cause a subsequent JSON decoding error if the 502 response is not proper JSON formatting (which appears to happen frequently).

https://mozilla.sentry.io/issues/4552231261/?environment=prod-gcp&environment=prod&query=is%3Aunresolved+rust.name%3Arustc&referrer=issue-stream&stream_index=2

We should isolate the BAD_GATEWAY response, not try to decode the payload, and report the error back to endpoint as a 502 with RETRY.

Sentry Issue: AUTOPUSH-RS-3X

┆Issue is synchronized with this Jira Bug

froodian commented 11 months ago

Any info or update on this issue? We've seen an ongoing material rise in persistent (not resolving on retry) Web Push 502s to Android Firefox users, with response bodies like

{"code":502,"errno":null,"error":"Bad Gateway","message":"Unable to deserialize FCM response","more_info":"http://autopush.readthedocs.io/en/latest/http.html#error-codes"}

jrconlin commented 11 months ago

Sorry for the late reply.

Starting Sept 22, we had an incident where we could no longer send messages to Android users via the old, Google Cloud Messaging (GCM) network. This would impact users who may have created very old endpoints using Firefox for Android (Fennec), which was discontinued several years ago.

As of Oct. 05, we deployed a "canary" fix to address the issue. This did manage to address the bulk of our issue. Due to the weekend and holiday, however, we've held off on deploying the fix to the larger server population, however that version should be widely deployed now (Oct 10).