Closed amitsaxena closed 7 years ago
As the error states, you are sending too many streams and you have exceeded the maximum amount of streams specified by the peer. Think of it like the HTTP/2 equivalent of a HTTP/1.1 429 Too Many Requests
response.
Also, I highly recommend to upgrade to the stable 1.0.1
version.
@ostinelli we have a single connection, and were sending push via it using push_async
(as described in the docs). Should we add some kind of timeouts in between (say every 1000 pushes)? What's a good approach to send notifications to a high number of devices?
Thanks for the help, and the gem!
Try to understand when you hit a limit. This will give you better clarity on your strategy. I've seen a limit hit around 40k concurrent streams on a single connection, but I haven't looked for official documentation from Apple regarding it.
You can either send pushes in smaller batches and pause between them (as you suggest), or increase the connection pool that you use so that all of your pushes are diluted into more connections, hence every connection has less chance to hit the limit.
We saw this error again today on production while sending it to just 1000 devices. We chunk our devices into sets on 1K each, and sleep for 2 seconds after each chunk, and we still got this error for the first 1K devices.
May be having some kind of fail-safe (reconnect on this exception) in apnotic for this might be a good idea, else async push is almost unusable if it fails for such a small number of device tokens. As a temporary workaround we now send to 100 devices at a time (and 2 s sleep), but this has resulted in very long running sidekiq jobs.
Does anyone else use async push on production? Is doing sync pushes the recommended way?
May be having some kind of fail-safe (reconnect on this exception) in apnotic for this might be a good idea
This raises other questions though.
Would retries be at:
If the latter, what should we do with requests that might (or not) have sent data but have not received a response back? Should we consider them as incomplete and retry? Wouldn't this though perform, in certain conditions, twice the same request, thus even for non-idempotent requests?
This is somewhat related to #28. I don't understand HTTP/2 well enough to know if this makes sense.
As I understand it a stream is supposed to be "complete" at a certain point. Maybe this is when a request and response are sent along with an END_STREAM. at this point the stream should be closed and reduce the count of open streams. Not sure if this happens automatically or if the client needs to send RST_STREAM.
If net-http2 kept track of MAX_CONCURRENT_STREAMS and then once this number was hit would block until a stream closed or the server sent a new MAX_CONCURRENT_STREAMS it would never exceed the number of streams.
It sounds like this is what golang is considering or what they currently do which is to open a new connection when MAX_CONCURRENT_STREAMS is exceeded.
I am not sure that HTTP2::Error::StreamLimitExceeded
is a bad thing. The http-2 gem keeps track of the MAX_CONCURRENT_STREAMS and it is throwing that exception to ensure Apnotic always stays within the max.
We deliver 30-40K pushes every hour (most on the hour) and we just allow the exception to trigger the Sidekiq retry logic. I don't think I have seen more than a dozen retries per batch. Using Sidekiq in that way keeps the notification "queue" or "buffer" outside of Apnotic. I think this is a good thing because you'd need complicated logic to track notification state in the any internal buffer so that you'd know if a notification was actually sent in the event of a crash or error.
Opening a new connection when you hit "MAX_CONCURRENT_STREAMS" will result in an issue for the JWT style push because MAX_CONCURRENT_STREAMS starts out a 1 and then increases after the first push. I know this isn't an immediate problem but it needs to be in the back of our minds.
We deliver 30-40K pushes every hour (most on the hour) and we just allow the exception to trigger the Sidekiq retry logic
@idyll the problem with that is multiple notifications being sent to same device. We use async push, open a connection, send push to all registered devices and then close the connection. So at the point when exception is encountered, push has already been sent to quite a few devices.
@ostinelli yes the delta thing (the pushes which have been sent and response isn't received yet, before the exception) is tricky. I believe it will be the same at NetHttp2 level, unless there a way using which we can wait for all the responses before raising the HTTP2::Error::StreamLimitExceeded exception, and no new request is sent during this duration.
@amitsaxena your setup sounds different then mine. We track things in a way that we'd know the messages were already delivered with the async push. And the resumption would start where the error occurred.
That's how we do it too.
@idyll would you mind sharing your setup? We cannot use a pool of persistent connections (as described in docs) as we need to send notifications to different apps (100s of them), that's why we open a connection, send notifications, and then close it.
If nothing else works, we might use sync pushes. They might be slow, but I believe will be reliable given that it sends one notification at a time, and therefore we will not hit this error.
We cannot use a pool of persistent connections (as described in docs) as we need to send notifications to different apps (100s of them)
I do this with pools. I create a hash of pools where the key is the "topic". (where by topic I mean topic of your push certificate.) When a notification is created I store the topic, the push token and the content of the notification in Redis. Sidekiq workers grab the notifications and deliver them.
It's super important to avoid your DB with these workers if you want to achieve any sort of volume. So try to push all the data into redis that you need to send the notification without hitting the DB again.
If a notification isn't deliverable there's a separate queue to handle archiving the device.
Interesting approach! Just that it's not practical for us to have persistent connections given that the total number of apps is high, and majority of them rarely use push notifications service.
Closing due to no activity. If additional input is needed, can re-open.
We were using the async method to send notifications and while sending it to around 40K devices on a single connection we got the HTTP2::Error::StreamLimitExceeded exception.
Can anyone explain what this error means? Is there a limit to the amount of data I can send on a HTTP/2 connection? A part of the notifications did go out, and the remaining failed.
We were on version 0.10.0.