Closed waisingyiu closed 6 months ago
mobile-n10n:notificationworkerlambda
to CODEmobile-n10n:eventconsumer
to CODEmobile-n10n:schedule
to CODEmobile-n10n:fakebreakingnewslambda
to CODEmobile-n10n:football
to CODEmobile-n10n:reportextractor
to CODEmobile-n10n:report
to CODEmobile-n10n:notification
to CODEmobile-n10n:slomonitor
to CODEmobile-n10n:registration
to CODE
What does this change?
The Firebase cloud message lagacy HTTP endpoint is going to be discontinued this June, and there is no replacement for the multicast API in the HTTP API v1. We have to explore alternatives to improve the throughput of our notification service in order to achieve the 90in2 SLA we are seeing today with the soo-to-be-expired multicast API.
This PR implements direct firebase HTTP calls over multiplexed HTTP/2 protocol. As this Firebase FAQ suggests,
The HTTP v1 API over HTTP/2 performs similarly for 99.9% of multicast requests (sending < 100 tokens). For outlier use cases (sending 1000 tokens), it achieves up to a third of the throughput rate, so additional concurrency is needed to optimize for this atypical use case.
We are going to validate this option over small notifications and apply it to larger notifications if we have more confidence on its throughput.
We have made the following changes:
How to test
The changes were tested on CODE and notification can be sent to my simulator. However, we only have a very small number of device token on CODE database (compared to PROD) so we need to run this change as an experiment on PROD.
After the changes are deployed to PROD, we may first enable the individual send on content notification first (by setting
tag/
andcontributor/
in the selected topics).When it is shown to send notifications correctly with reasonable throughput, we may extend the topics to breaking sport notification (such as 'breaking/international-sport`) and monitor its throughput all along. As we gain confidence, we may gradually extend it to cover other breaking news topic.
Some parameter tuning may be needed during the experiment.
How can we measure success?
Notifications can be sent with throughput comparable to what we have achieved with multicast API.
Have we considered potential risks?
It may hit some resources problem or have some latency issues. That's the reason we started with small notification.