Closed elmigranto closed 9 years ago
Thanks, will review. Can you go on with actually sending the push notification? Eventually get the whole chain to work with a "fake" push notification service first.
Yeah, sure. So when notifications comes in to notifications-api, it can contain notification.push.message
String (push
is 1st level, not in data
?), which we should send with redis push tokens?
Yes push
is first level.
We an notification is received, if push
object exists, transmit the notification object to the push module. Push module will take care of sending the notification. (which indeed requires the users tokens)
Note, a notification may or may not have a message. The full notification object can be sent as payload in the push notification (both android and ios support custom payloads).
if push object exists, transmit the notification object to the push module
Currently, we add notifications to our queue, but I assume that push ones won't go there and only be passed to push module. Is that so?
Push ones will go in both the user's queue and be send to the push module.
What would be nice architecturally is to add push notifications to be sent out to a redis queue. Then have a worker take tasks from the queue. Losing a push notification from time to time is not a big deal, so it can be done with basic LPUSH / RPOP. At some point, we can externalize the worker.
Notifications:
.push
, add to "push queue".Push API:
Is that what you have in mind?
Yes.
Ideally, I'd totally separate the "redis-pulling-push-sending" module to a separate project (without API, just pure compute), so that notifications modules stays generic, without dependencies on Apple or Google technologies.
It can be doing a "one shot" pull-till-empty on the redis push-notifications queue, then exit. This way, sending push notifications can be scheduled on a cron job.
Well most of the stuff is easily movable to another service with sync via redis, so shouldn't be a problem to pull it out later.
While I'm on the frontend, reading more doc, I came across the localization issue. To make it short, here's the required fields to solve that on the server side.
The push
field found in notifications can have the following data:
"push": {
"type": "someone_loves_someone",
"title": [ "Love {1}", "bob" ],
"message": [ "Did you know? {1} loves {2}", "alice", "bob" ],
}
Should allow to fill APN:
title
with title[0]
with arguments insertedtitle-key
with #{type}_title
title-args
with title[1..n]
body
with message[0]
with arguments insertedloc-key
with #{type}_message
loc-args
with message[1..n]
For GCM, didn't check yet but there's surely something similar.
(edited the comment above)
Not sure I'm following what's going on :) Are you saying that we should send different stuff to APN and GCM, so the app on device can localize it? (We have common .push
server-side and modify its contents before each send depending on token.type
?)
Just one format on notifications
side (documented above). Backend worker will convert this format to either APN or GCM's, add the token and send to the provider's server.
What am I thinking about worker for sending push notifications.
We'll have 2 queues — pending
(the one notifications module dumps push messages) and processing
(messages to be sent immediately).
Worker strategy:
processing
on startup and process them.pending
to send within redis transaction.brpoplpush pending processing 0 // waits for at least 1 message
rpoplpush pending processing // repeated N-1 times
expire pending // not sure if required,
// but we surely don't want to send stale notifications
lrange pending 0 -1
del pending
Not sure how to handle failed sends, it is probably not okay to just ignore them, but is not great letting them hang indefinitely (can't expire particular list items). Maybe remove pending
anyway, and retry to send them on the next run. Will leave this as TODO for the first version.
Two queues will allow us to start multiple workers in the future (each will have own pending
queue).
Now that I wrote this I realized the quickest thing is to just do BRPOP
continuously on workers, I guess I'll start with that and we can see if that would be enough.
Ok, here's initial thing, now we need to actually send push notifications.
To actually test stuff with real services and not stubs, I figured you'll do it yourself because API keys; or do you have something else in mind?
The example files I sent are actually used in production for another project. Still working without glitches. So you can just keep using those. For GCM, this guy apparently did a little bit of cleanup a few month ago. Seems to be worst it.
For the final test without stub I suppose it'll be easier that I do it (as it requires the development client installed on a phone). But I'll provide a API key + a token for you to test sending requests to APN servers. You can't be sure that it did hit the target, but at least that Apple accepted the payload (you can even check with me by skype that I got it).
BTW, iOS is the most urgent. As you know Android version is planned for the end of the month, we've got a bit more time.
iOS is the most urgent
Ignoring Android part for now, so should be finished with iOS tomorrow.
Let me know if I understood formatting part right: task.coffee
Okay. This is somewhat messy for now, but should work. I added some config vars with pathes to .pem files and other APN options. To start a worker, run coffee src/push-api/sender-cli.coffee
.
If you set TEST_APN_TOKEN
env var when running mocha tests, it will actually send stuff to Apple; otherwise that test will be pending. If you set TEST_APN_TOKEN
when running worker, it will put some data in redis and send it to Apple as a simple test.
Would also be nice to document the push
field in README.md → POST notification.
This module should enter production next week, seems feasible to you? I deployed on the staging server today, at least it doesn't seems to break the onlinelist and notifications APIs.
Will update readme according to your notes.
This module should enter production next week, seems feasible to you? I deployed on the staging server today, at least it doesn't seems to break the onlinelist and notifications APIs.
Should not break anything, since pieces are quite independent and most stuff happens in the stand alone worker anyways. Only concern is performance; we can maybe do a test run, by dumping some notifications to redis and checking how it'll do.
The other thing I just thought of is that if something goes wrong (Apple server is down or any error in sending message actually), worker won't quit; wrote that off as TODO at first and it slipped my mind. So before we go live I'd like to fix it, will get on it first thing tomorrow, don't put it live just yet.
(Not sure if GH sends notifications on closed PRs, so I am tagging you @j3k0)
I usually look for TODOs left, but we should probably grep sources as part of make test
for TODO
to prevent such things; why trust people if you can get a machine to do it :)
Ok. For notifications refused by apple (or if apple server can't be contacted for some reasons), it's fine to just throw them away with a warning on the logs.
Great idea to have make test
fail for remaining TODO
. Eventually greping TODO
should just show a warning (like a pending test), greping FIXME
should fail the test. Easy to put in place in Makefile, need help to do it?
BTW, I deployed the worker on staging servers, you may have noticed the push-worker.sh
script that starts a worker every N seconds. It isn't doing anything, because there's no activity on the staging server, yet after a few hours this ends up eating all RAM and CPU.
Logs seems fine:
Jul 12 17:55:02 staging-compute push-worker-1-2-2-1.8f832ca1: ./node_modules/.bin/coffee src/push-api/sender-cli.coffee
Jul 12 17:55:03 staging-compute push-worker-1-2-2-1.8f832ca1: {"name":"notifications","hostname":"push-worker-1-2-2-1","pid":15805,"SenderCli":true,"level":30,"msg":"onTask() recieved null task, queue is empty","time":"2015-07-12T14:55:02.969Z","v":0}
Jul 12 17:55:03 staging-compute push-worker-1-2-2-1.8f832ca1: {"name":"notifications","hostname":"push-worker-1-2-2-1","pid":15805,"SenderCli":true,"level":30,"msg":"Done successfully","time":"2015-07-12T14:55:03.012Z","v":0}
Jul 12 17:55:03 staging-compute push-worker-1-2-2-1.8f832ca1: ./node_modules/.bin/coffee src/push-api/sender-cli.coffee
Jul 12 17:55:03 staging-compute push-worker-1-2-2-1.8f832ca1: {"name":"notifications","hostname":"push-worker-1-2-2-1","pid":15805,"SenderCli":true,"level":30,"msg":"Redis closed with { err: null, reply: 'OK' }","time":"2015-07-12T14:55:03.145Z","v":0}
Culprit seems to be initialization of the worker, it takes a few seconds even with all redis hosts on the same datacenter. If it gets too slow, multiple workers then try initializing in parallel, so this just get slower, thus more workers... spiral of death. Any idea what can be done? The easy fix for now will be to not start multiple workers in parallel.
(Note, I get notified with or without @ on closed PR)
I don't think spawning processes no matter what is a good idea. We should either put upper limit on number of workers running at the same time, or we can start 1 periodically and fork it internally.
Right, I'll just make sure it doesn't start more instances than the number of available CPUs (vcore), at least this prevents the spiral of death.
@j3k0 ready for review.