uniqush / uniqush-blog

Uniqush Blog
https://uniqush.github.io/uniqush-blog/
5 stars 3 forks source link

Migrate "How We Solved Push Notifications at if(we)" article #5

Open TysonAndre opened 4 years ago

TysonAndre commented 4 years ago

I have an old draft (version 2) from @mishan - the original blog the blog post linked to no longer exists. Incompletely converted from ODT to markdown with pandoc below


How We Solved Push Notifications at if(we)

By Misha Nasledov (@mishan)

The Early Days of Push

In the early days of mobile app development, back when the first versions of the Tagged mobile application were being developed, a very scrappy mobile push notification system was put together. The original push code was written in PHP without using any sort of library. It supported GCM (Google Cloud Messaging), C2DM (prior to GCM’s existence), and APNs (Apple Push Notification Service). We had a very lame subscriber database -- only the most recently used device would receive push notifications. We did not handle every exception from the service properly, such as an unsubscribe after uninstalling the app, or follow certain best practices. In particular, APNs is a bit more involved to use as it requires calling a feedback service to get the result of the push.*1

*1 = Latest APNs HTTP/2 spec obviates this.

The Search For A Better Way

We looked at various solutions as we wanted to revamp our push notification system in order to get more out of it. We decided the best place to start was to actually improve the push notification engine and the interface to it. Not being particularly married to our in-house GCM and APNs push code, we looked at various, alternative, off-the-shelf solutions in lieu of trying to improve the old system.

We wanted a system that would let us better abstract away different push service provider APIs. The ability to push to more than one device per user was also something we desired. The PHP push code gave us enough trouble with the lack of persistent sockets -- there was already a lot of opening and closing of connections with APNs, and sending more notifications per person meant even more connection churn. The new system needed to use sockets efficiently, and handle errors more gracefully.

We didn’t particularly want to go with some vendor for sending push notifications. When dealing with users’ person data, having one less party involved keeps our users’ information safer. A third party service would not have offered us as tightly integrated control and flexibility. We also had plenty of spare servers to run a service.

Uniqush

While researching potential solutions, we discovered an open source project known as Uniqush. It was open source, it had some users, and the source looked relatively simple enough that we could give it a shot and work with it. The only dependency was a persistent Redis server, which we had already set up for another, unrelated project previously before considering Uniqush. It’s noteworthy that the project was structured so that one could write a different database module so that an RDBMS such as MySQL or PostgreSQL could be used, but currently only Redis is supported.

In a nutshell, Uniqush keeps information about “push service providers” (PSPs). PSPs are the push notification endpoints (e.g.: our Tagged mobile application GCM endpoint). Service names identify sets of one or more PSPs, each PSP having a unique push service type. We make these one to one so we can work on pushes for apps independently. Uniqush also keeps information about subscribers, which are associated with sets of one or more devices registered with a service endpoint, in Redis. In order for this to be useful, one has to set up their Redis server for persistence. So long as all one’s data can all be kept in RAM, persistence is pretty easy. We have millions of mobile users, many of whom have more than one device, and our subscriber database is (relatively speaking) pretty small -- about 15GB. This also solved our shortcomings with our subscriber database without us having to write a new way to store subscribers.

Uniqush supports the services we use (GCM and APNs) as well as ADM (Amazon Device Messaging.) The one shortcoming the project had was that it did not support passing JSON payloads directly through, but instead constructed the payloads from passed-in parameters. This was an issue as we pass custom push notification payloads to our clients that contain data about alert counters and, for Android, a profile picture URL. Changing the way the client processes the notifications would break older versions of clients. We ended up changing the code that constructs the payloads and created a way to pass raw JSON payloads (intended for a specific device type) directly to Uniqush.

Giving Uniqush a Shot

About a year ago we first put Uniqush on a couple of VMs on production and changed our PHP push code to try sending through Uniqush when an experiment was enabled. If the service call to Uniqush failed, it would fall through to the old implementation, just in case. We first tried using Uniqush to send our GCM push notifications and it ended up working mostly without trouble, sending about 250 push notifications per second. There were a couple of small bugs that became evident once Uniqush was running at production load, but they were easily fixed.

APNs proved to be a bit trickier. There’s more complexity to the protocol, requiring asynchronous writes and reads on TCP sockets, having to track 32-bit identifier, and the fact that Apple will close the socket immediately instead of giving an error code when a push fails. Uniqush’s APNs module turned out to not have a very reliable implementation and unfortunately fell over at production load. However, due to the pros of Uniqush, success with GCM, and overall simplicity of the code, we kept investing in the project. We rewrote the APNs module to use a worker pool implementation that didn’t have the race conditions of the existing implementation.

Scaling

Currently we use Uniqush to send all of the mobile application GCM and APNs push notifications for Tagged and hi5 at if(we). That’s about 400-500 notifications per second. Because it’s a standalone service that has no internal knowledge of the Tagged application or any other business logic, we can easily use it for other apps we develop.

To reach this scale, we have four 4-core 4GB hosts running the uniqush-push instances. We currently run three Uniqush processes per VM, though, in reality, the tier is a bit over-provisioned to handle growth and any surge of activity. The Uniqush instances actually end up taking a lot more queries than just the 400-500 notifications per second. We query the Uniqush subscriber database bwefore sending a push notification so that we can make more intelligent decisions about whether to push to a subscriber. The mobile clients, in aggregate, send about another 500 subscriptions per second. Overall, the tier is handling something around 1500 queries per second.

All of these queries end up hitting Redis to obtain, modify, and/or add subscriber information. Before embarking on this project, we had already built a large, general purpose persistent Redis “cluster.” It is not actually a Redis Cluster but, rather, it is a cluster of Redis shards with consistent hashing. Uniqush uses our fork of Twitter’s twemproxy in order to be able to utilize the cluster. Our fork contains a yet-to-be-merged patch by @andyqzb to add Redis Sentinel support so that failovers can be handled properly. We have two 32-core 256GB hosts to run the Redis master and slave shards.

What’s Next?

We’ve contributed fixes and improvements we’ve made to the Uniqush project back upstream and continue to make improvements and contributions to the project. The ability to store other data with individual subscriber devices such as client versions and subscription dates has been developed but hasn’t been pushed back upstream yet as we haven’t even really started using these attributes ourselves. It will allow for much more intelligent application logic -- for instance, we could send some kind of new push notification only to the devices of subscribers with the latest application version on their device. Our fork which may have experimental features under development that have not been pushed upstream yet is located at http://github.com/ifwe/uniqush-push

Uniqush has been a resounding success at if(we). A few months ago we finally ripped out the old push notification code from our PHP (web) codebase. Uniqush was sending all of our APNs and GCM push notifications at full production load without issue. It made everything much simpler. The concern of implementing and maintaining the APNs and GCM implementations is gone. All our PHP code has to do now is deal with constructing push notifications (more specifically, the content of the notifications and any application-specific log) and relaying them to Uniqush as well as telling Uniqush to subscribe and unsubscribe devices of users. Uniqush takes care of maintaining the subscriber database, handling errors / exceptions, and actually sending the push notifications to Apple and Google’s servers. This ability to operate at a more abstract level has made it easy for us to then focus on things like creating an A/B experiment framework for push notification content and scheduling, smarter push notification scheduling, and more intelligent device routing for push notifications.

Acknowledgments

Thank you Nan Deng (@monnand) for creating Uniqush! It ended up working quite well at if(we). And a big shout-out to our colleague Tyson Andre (@TysonAndre) for making and driving many improvements to Uniqush.

<span id="anchor"></span>**How We Solved Push Notifications at if(we)**

By [*Misha Nasledov (@mishan)*](https://github.com/mishan)

<span id="anchor-1"></span>The Early Days of Push

In the early days of mobile app development, back when the first
versions of the Tagged mobile application were being developed, a very
scrappy mobile push notification system was put together. The original
push code was written in PHP without using any sort of library. It
supported GCM (Google Cloud Messaging), C2DM (prior to GCM’s existence),
and APNs (Apple Push Notification Service). We had a very lame
subscriber database -- only the most recently used device would receive
push notifications. We did not handle every exception from the service
properly, such as an unsubscribe after uninstalling the app, or follow
certain best practices. In particular, APNs is a bit more involved to
use as it requires calling a feedback service to get the result of the
push.\*1

\*1 = Latest APNs HTTP/2 spec obviates this.

<span id="anchor-2"></span>The Search For A Better Way

We looked at various solutions as we wanted to revamp our push
notification system in order to get more out of it. We decided the best
place to start was to actually improve the push notification engine and
the interface to it. Not being particularly married to our in-house GCM
and APNs push code, we looked at various, alternative, off-the-shelf
solutions in lieu of trying to improve the old system.

We wanted a system that would let us better abstract away different push
service provider APIs. The ability to push to more than one device per
user was also something we desired. The PHP push code gave us enough
trouble with the lack of persistent sockets -- there was already a lot
of opening and closing of connections with APNs, and sending more
notifications per person meant even more connection churn. The new
system needed to use sockets efficiently, and handle errors more
gracefully.

We didn’t particularly want to go with some vendor for sending push
notifications. When dealing with users’ person data, having one less
party involved keeps our users’ information safer. A third party service
would not have offered us as tightly integrated control and flexibility.
We also had plenty of spare servers to run a service.

<span id="anchor-3"></span>Uniqush

While researching potential solutions, we discovered an open source
project known as [*Uniqush*](https://github.com/uniqush/uniqush-push).
It was open source, it had some users, and the source looked relatively
simple enough that we could give it a shot and work with it. The only
dependency was a persistent Redis server, which we had already set up
for another, unrelated project previously before considering Uniqush.
It’s noteworthy that the project was structured so that one could write
a different database module so that an RDBMS such as MySQL or PostgreSQL
could be used, but currently only Redis is supported.

In a nutshell, Uniqush keeps information about “push service providers”
(PSPs). PSPs are the push notification endpoints (e.g.: our Tagged
mobile application GCM endpoint). Service names identify sets of one or
more PSPs, each PSP having a unique push service type. We make these one
to one so we can work on pushes for apps independently. Uniqush also
keeps information about subscribers, which are associated with sets of
one or more devices registered with a service endpoint, in Redis. In
order for this to be useful, one has to set up their Redis server for
persistence. So long as all one’s data can all be kept in RAM,
persistence is pretty easy. We have millions of mobile users, many of
whom have more than one device, and our subscriber database is
(relatively speaking) pretty small -- about 15GB. This also solved our
shortcomings with our subscriber database without us having to write a
new way to store subscribers.

Uniqush supports the services we use (GCM and APNs) as well as ADM
(Amazon Device Messaging.) The one shortcoming the project had was that
it did not support passing JSON payloads directly through, but instead
constructed the payloads from passed-in parameters. This was an issue as
we pass custom push notification payloads to our clients that contain
data about alert counters and, for Android, a profile picture URL.
Changing the way the client processes the notifications would break
older versions of clients. We ended up changing the code that constructs
the payloads and created a way to pass raw JSON payloads (intended for a
specific device type) directly to Uniqush.

<span id="anchor-4"></span>Giving Uniqush a Shot

About a year ago we first put Uniqush on a couple of VMs on production
and changed our PHP push code to try sending through Uniqush when an
experiment was enabled. If the service call to Uniqush failed, it would
fall through to the old implementation, just in case. We first tried
using Uniqush to send our GCM push notifications and it ended up working
mostly without trouble, sending about 250 push notifications per second.
There were a couple of small bugs that became evident once Uniqush was
running at production load, but they were easily fixed.

APNs proved to be a bit trickier. There’s more complexity to the
protocol, requiring asynchronous writes and reads on TCP sockets, having
to track 32-bit identifier, and the fact that Apple will close the
socket immediately instead of giving an error code when a push fails.
Uniqush’s APNs module turned out to not have a very reliable
implementation and unfortunately fell over at production load. However,
due to the pros of Uniqush, success with GCM, and overall simplicity of
the code, we kept investing in the project. We rewrote the APNs module
to use a worker pool implementation that didn’t have the race conditions
of the existing implementation.

<span id="anchor-5"></span>Scaling

Currently we use Uniqush to send all of the mobile application GCM and
APNs push notifications for Tagged and hi5 at if(we). That’s about
400-500 notifications per second. Because it’s a standalone service that
has no internal knowledge of the Tagged application or any other
business logic, we can easily use it for other apps we develop.

To reach this scale, we have four 4-core 4GB hosts running the
uniqush-push instances. We currently run three Uniqush processes per VM,
though, in reality, the tier is a bit over-provisioned to handle growth
and any surge of activity. The Uniqush instances actually end up taking
a lot more queries than just the 400-500 notifications per second. We
query the Uniqush subscriber database bwefore sending a push
notification so that we can make more intelligent decisions about
whether to push to a subscriber. The mobile clients, in aggregate, send
about another 500 subscriptions per second. Overall, the tier is
handling something around 1500 queries per second.

All of these queries end up hitting Redis to obtain, modify, and/or add
subscriber information. Before embarking on this project, we had already
built a large, general purpose persistent Redis “cluster.” It is not
actually a Redis Cluster but, rather, it is a cluster of Redis shards
with consistent hashing. Uniqush uses [*our
fork*](https://github.com/ifwe/twemproxy) of Twitter’s
[*twemproxy*](https://github.com/twitter/twemproxy) in order to be able
to utilize the cluster. Our fork contains a yet-to-be-merged
[*patch*](https://github.com/twitter/twemproxy/pull/324) by
@[*andyqzb*](https://github.com/andyqzb) to add Redis Sentinel support
so that failovers can be handled properly. We have two 32-core 256GB
hosts to run the Redis master and slave shards.

<span id="anchor-6"></span>What’s Next?

We’ve contributed fixes and improvements we’ve made to the Uniqush
project back upstream and continue to make improvements and
contributions to the project. The ability to store other data with
individual subscriber devices such as client versions and subscription
dates has been developed but hasn’t been pushed back upstream yet as we
haven’t even really started using these attributes ourselves. It will
allow for much more intelligent application logic -- for instance, we
could send some kind of new push notification only to the devices of
subscribers with the latest application version on their device. Our
fork which may have experimental features under development that have
not been pushed upstream yet is located at
[*http://github.com/ifwe/uniqush-push*](http://github.com/ifwe/uniqush-push)

Uniqush has been a resounding success at if(we). A few months ago we
finally ripped out the old push notification code from our PHP (web)
codebase. Uniqush was sending all of our APNs and GCM push notifications
at full production load without issue. It made everything much simpler.
The concern of implementing and maintaining the APNs and GCM
implementations is gone. All our PHP code has to do now is deal with
constructing push notifications (more specifically, the content of the
notifications and any application-specific log) and relaying them to
Uniqush as well as telling Uniqush to subscribe and unsubscribe devices
of users. Uniqush takes care of maintaining the subscriber database,
handling errors / exceptions, and actually sending the push
notifications to Apple and Google’s servers. This ability to operate at
a more abstract level has made it easy for us to then focus on things
like creating an A/B experiment framework for push notification content
and scheduling, smarter push notification scheduling, and more
intelligent device routing for push notifications.

<span id="anchor-7"></span>Acknowledgments

Thank you Nan Deng (@[*monnand*](https://github.com/monnand)) for
creating Uniqush! It ended up working quite well at if(we). And a big
shout-out to our colleague Tyson Andre
(@[*TysonAndre*](https://github.com/TysonAndre)) for making and driving
many improvements to Uniqush.
mishan commented 4 years ago

I have the final draft here https://misha.nasledov.com/uniqush.html

From a quick glance, it looks the same

TysonAndre commented 4 years ago

The only change I could suggest is APNS -> APNs, per Apple's own documentation. I could probably link to that - the image links link (singular) is broken for https://misha.nasledov.com/uniqush.html for me as part of the blog no longer existing, though

<p class="block-img"><img src="https://d3gqbr1mr54afg.cloudfront.net/ifwe/0d1608dff6d5caf7dcd7bb4b44c45fc171a3d030_screen-shot-2016-04-27-at-5.49.52-pm.png" alt="" width="669" height="501" /></p>

  image