ostinelli / apnotic

A Ruby APNs HTTP/2 gem able to provide instant feedback.
MIT License
479 stars 94 forks source link

Guidance on asynchronous pushes with sidekiq and connection pools. #97

Closed zrice57 closed 3 years ago

zrice57 commented 4 years ago

I've seen a variety of discussion around this but I'm still having trouble with exception handling

Basically if I do async pushes in a sidekiq job and an exception is raised by the connection thread i.e. in the

push.on(:response) do |response|

block it will crash the whole sidekiq processes and not just the job.

I've seen this issue kind of hand waved by adding an error handler like:

connection.on(:error) do |exception|
  Rails.logger.error "Apnotic exception has been raised: #{exception}"
end

However when using that, the sidekiq job will succeed as it has no way of knowing about the error. If there is an error in the push.on(:response) do |response| I want the job to fail so it can be retried.

I'm not sure how to accomplish this. Maybe the connection class could store errors that sidekiq could check for? Then at the end of a job you could do something like:

# wait for batch to finish
connection.join(timeout: 5)

# check for errors
raise 'batch had errors' if connection.errors.any?

Also, thank you for responding so quickly to the other issue i filed!

benubois commented 3 years ago

I'm not really sure. @krasnoukhov, what did you have in mind when adding #70?

krasnoukhov commented 3 years ago

Hmm I'm not really sure either because we don't use async pushes. Since we use Sidekiq and the jobs are already async, we just use the pool to send pushes in a sync way, so we can verify that response is successful (and raise, so Sidekiq will retry) if not

zrice57 commented 3 years ago

We ended up extending the Apnotic::Connection class and registering an exception callback that stores the exceptions in an array. This way after some async pushes we can check for errors and raise an exception at an appropriate place in the sidekiq job.

Although sidekiq is asynchronous you still end up underutilizing the connections. Only one sidekiq worker can use a connection at a time to send a single notification where most of the time is spent waiting for the response.

If you're sending notifications at any kind of scale this overhead really adds up. Especially in our case where the sidekiq workers were in Heroku US-East and the apple push API is in US-West. In this scenario sending a single push would take about 300ms. We would scale our Sidekiq workers up significantly but sending 100,000 notifications would still take about 20 minutes.

When using the HTTP2 asynchronous pushes we can send a batch of 1000 notifications in the same 300ms. Sending 100,000 notifications in batches of 1000 now takes about 30 seconds. We're sending pushes and receiving responses asynchronously making better use of the capability of the network socket.

We've also saved some time by batching the database queries. Instead of 1 SELECT to retrieve the push record and 1 UPDATE to save the response; for each push we're batching those as well. Even if each query only takes 1ms that adds up with 100,000 notifications.

krasnoukhov commented 3 years ago

Nice, thanks for sharing @zrice57, I guess this would be a good material for the "Async pushes" section of the documentation

zrice57 commented 3 years ago

You're welcome! Let me know if you have any questions.

benubois commented 3 years ago

Glad you were able to find a solution @zrice57! But yes, if you have time and inclination, a code sample of your strategy would make a great addition to the async_push documentation.