j3k0 / ganomede-notifications

Long-pull notification service for Ganomede
0 stars 0 forks source link

Push API Initial #15

Closed elmigranto closed 9 years ago

elmigranto commented 9 years ago

@j3k0 ready for review.

j3k0 commented 9 years ago

Thanks, will review. Can you go on with actually sending the push notification? Eventually get the whole chain to work with a "fake" push notification service first.

elmigranto commented 9 years ago

Yeah, sure. So when notifications comes in to notifications-api, it can contain notification.push.message String (push is 1st level, not in data?), which we should send with redis push tokens?

j3k0 commented 9 years ago

Yes push is first level.

We an notification is received, if push object exists, transmit the notification object to the push module. Push module will take care of sending the notification. (which indeed requires the users tokens)

Note, a notification may or may not have a message. The full notification object can be sent as payload in the push notification (both android and ios support custom payloads).

elmigranto commented 9 years ago

if push object exists, transmit the notification object to the push module

Currently, we add notifications to our queue, but I assume that push ones won't go there and only be passed to push module. Is that so?

j3k0 commented 9 years ago

Push ones will go in both the user's queue and be send to the push module.

What would be nice architecturally is to add push notifications to be sent out to a redis queue. Then have a worker take tasks from the queue. Losing a push notification from time to time is not a big deal, so it can be done with basic LPUSH / RPOP. At some point, we can externalize the worker.

elmigranto commented 9 years ago

Notifications:

  1. parse notification;
  2. add to internal queue;
  3. if has .push, add to "push queue".

Push API:

  1. check "push queue" periodically (or smth like that, we'll see);
  2. send out push notifications.

Is that what you have in mind?

j3k0 commented 9 years ago

Yes.

Ideally, I'd totally separate the "redis-pulling-push-sending" module to a separate project (without API, just pure compute), so that notifications modules stays generic, without dependencies on Apple or Google technologies.

It can be doing a "one shot" pull-till-empty on the redis push-notifications queue, then exit. This way, sending push notifications can be scheduled on a cron job.

elmigranto commented 9 years ago

Well most of the stuff is easily movable to another service with sync via redis, so shouldn't be a problem to pull it out later.

j3k0 commented 9 years ago

Sending push

While I'm on the frontend, reading more doc, I came across the localization issue. To make it short, here's the required fields to solve that on the server side.

The push field found in notifications can have the following data:

"push": {
  "type": "someone_loves_someone",
  "title": [ "Love {1}", "bob" ],
  "message": [ "Did you know? {1} loves {2}", "alice", "bob" ],
}

Should allow to fill APN:

For GCM, didn't check yet but there's surely something similar.

j3k0 commented 9 years ago

(edited the comment above)

elmigranto commented 9 years ago

Not sure I'm following what's going on :) Are you saying that we should send different stuff to APN and GCM, so the app on device can localize it? (We have common .push server-side and modify its contents before each send depending on token.type?)

j3k0 commented 9 years ago

Just one format on notifications side (documented above). Backend worker will convert this format to either APN or GCM's, add the token and send to the provider's server.

elmigranto commented 9 years ago

What am I thinking about worker for sending push notifications. We'll have 2 queues — pending (the one notifications module dumps push messages) and processing (messages to be sent immediately).

Worker strategy:

  1. Check for any messages left in processing on startup and process them.
  2. Get up to N push messages from pending to send within redis transaction.
brpoplpush pending processing 0  // waits for at least 1 message
rpoplpush pending processing     // repeated N-1 times
expire pending                   // not sure if required, 
                                 // but we surely don't want to send stale notifications
  1. Retrieve messages from processing and send them in parallel.
lrange pending 0 -1
  1. Clear queue on success.
del pending

Not sure how to handle failed sends, it is probably not okay to just ignore them, but is not great letting them hang indefinitely (can't expire particular list items). Maybe remove pending anyway, and retry to send them on the next run. Will leave this as TODO for the first version.

Two queues will allow us to start multiple workers in the future (each will have own pending queue).

elmigranto commented 9 years ago

Now that I wrote this I realized the quickest thing is to just do BRPOP continuously on workers, I guess I'll start with that and we can see if that would be enough.

elmigranto commented 9 years ago

Ok, here's initial thing, now we need to actually send push notifications.

  1. Formats; do you have anything on how to convert to GCM format?
  2. Sending; I figured I use those modules from the examples you sent over skype: apn, gcm. Except GCM is 2 years old, but it looks like it simply sends HTTP request, so shouldn't be a big deal.

To actually test stuff with real services and not stubs, I figured you'll do it yourself because API keys; or do you have something else in mind?

j3k0 commented 9 years ago

The example files I sent are actually used in production for another project. Still working without glitches. So you can just keep using those. For GCM, this guy apparently did a little bit of cleanup a few month ago. Seems to be worst it.

For the final test without stub I suppose it'll be easier that I do it (as it requires the development client installed on a phone). But I'll provide a API key + a token for you to test sending requests to APN servers. You can't be sure that it did hit the target, but at least that Apple accepted the payload (you can even check with me by skype that I got it).

BTW, iOS is the most urgent. As you know Android version is planned for the end of the month, we've got a bit more time.

elmigranto commented 9 years ago

iOS is the most urgent

Ignoring Android part for now, so should be finished with iOS tomorrow.

Let me know if I understood formatting part right: task.coffee

elmigranto commented 9 years ago

Okay. This is somewhat messy for now, but should work. I added some config vars with pathes to .pem files and other APN options. To start a worker, run coffee src/push-api/sender-cli.coffee.

If you set TEST_APN_TOKEN env var when running mocha tests, it will actually send stuff to Apple; otherwise that test will be pending. If you set TEST_APN_TOKEN when running worker, it will put some data in redis and send it to Apple as a simple test.

j3k0 commented 9 years ago

Would also be nice to document the push field in README.md → POST notification.

This module should enter production next week, seems feasible to you? I deployed on the staging server today, at least it doesn't seems to break the onlinelist and notifications APIs.

elmigranto commented 9 years ago

Will update readme according to your notes.

This module should enter production next week, seems feasible to you? I deployed on the staging server today, at least it doesn't seems to break the onlinelist and notifications APIs.

Should not break anything, since pieces are quite independent and most stuff happens in the stand alone worker anyways. Only concern is performance; we can maybe do a test run, by dumping some notifications to redis and checking how it'll do.

elmigranto commented 9 years ago

The other thing I just thought of is that if something goes wrong (Apple server is down or any error in sending message actually), worker won't quit; wrote that off as TODO at first and it slipped my mind. So before we go live I'd like to fix it, will get on it first thing tomorrow, don't put it live just yet.

(Not sure if GH sends notifications on closed PRs, so I am tagging you @j3k0)

elmigranto commented 9 years ago

I usually look for TODOs left, but we should probably grep sources as part of make test for TODO to prevent such things; why trust people if you can get a machine to do it :)

j3k0 commented 9 years ago

Ok. For notifications refused by apple (or if apple server can't be contacted for some reasons), it's fine to just throw them away with a warning on the logs.

Great idea to have make test fail for remaining TODO. Eventually greping TODO should just show a warning (like a pending test), greping FIXME should fail the test. Easy to put in place in Makefile, need help to do it?

BTW, I deployed the worker on staging servers, you may have noticed the push-worker.sh script that starts a worker every N seconds. It isn't doing anything, because there's no activity on the staging server, yet after a few hours this ends up eating all RAM and CPU.

screen shot 2015-07-12 at 5 48 24 pm

Logs seems fine:

Jul 12 17:55:02 staging-compute push-worker-1-2-2-1.8f832ca1:  ./node_modules/.bin/coffee src/push-api/sender-cli.coffee 
Jul 12 17:55:03 staging-compute push-worker-1-2-2-1.8f832ca1:  {"name":"notifications","hostname":"push-worker-1-2-2-1","pid":15805,"SenderCli":true,"level":30,"msg":"onTask() recieved null task, queue is empty","time":"2015-07-12T14:55:02.969Z","v":0} 
Jul 12 17:55:03 staging-compute push-worker-1-2-2-1.8f832ca1:  {"name":"notifications","hostname":"push-worker-1-2-2-1","pid":15805,"SenderCli":true,"level":30,"msg":"Done successfully","time":"2015-07-12T14:55:03.012Z","v":0} 
Jul 12 17:55:03 staging-compute push-worker-1-2-2-1.8f832ca1:  ./node_modules/.bin/coffee src/push-api/sender-cli.coffee 
Jul 12 17:55:03 staging-compute push-worker-1-2-2-1.8f832ca1:  {"name":"notifications","hostname":"push-worker-1-2-2-1","pid":15805,"SenderCli":true,"level":30,"msg":"Redis closed with { err: null, reply: 'OK' }","time":"2015-07-12T14:55:03.145Z","v":0} 

Culprit seems to be initialization of the worker, it takes a few seconds even with all redis hosts on the same datacenter. If it gets too slow, multiple workers then try initializing in parallel, so this just get slower, thus more workers... spiral of death. Any idea what can be done? The easy fix for now will be to not start multiple workers in parallel.

(Note, I get notified with or without @ on closed PR)

elmigranto commented 9 years ago

I don't think spawning processes no matter what is a good idea. We should either put upper limit on number of workers running at the same time, or we can start 1 periodically and fork it internally.

j3k0 commented 9 years ago

Right, I'll just make sure it doesn't start more instances than the number of available CPUs (vcore), at least this prevents the spiral of death.