Closed scottburton11 closed 3 years ago
:+1: to this. I ran into an issue where device tokens became stale, at which point if I am sending multiple notifications simultaneously all of the subsequent notifications are dropped silently due to the stale token(s).
Same problem here! How should this situation be managed? Now I send notification individually, as a temporary fix.
@micred I did the same thing, which isn't desirable due to the additional overhead. I'd like a stale token return so I can return that error the client to force a token refresh or remove the sender from the list.
I also don't like that APN itself simply stops subsequent execution, I wish it would finish the job and return the errors.
Hi! I'm the maintainer of Pushy, a Java APNs library. I realize this is something of an old issue and the specifics of our solutions may not apply directly, but I wanted to share what we've done in this area.
There really isn't any combination of waiting or blocking or reading you can do to guarantee that a notification has been sent and not rejected. Even if you attempt a blocking read after a write, there's absolutely no guarantee as to when the APNs gateway will get around to sending a "bad push notification" message down the line. Getting no data when you attempt to read doesn't necessarily mean the notification wasn't rejected; it may just mean that it hasn't been rejected yet.
The only time we know anything about the state of any notification is when we get a definitive "bad push notification" response from the APNs gateway. At that point, we know that everything sent before the rejected notification was accepted (TCP guarantees that everything happens in order), one specific notification was rejected and should not be resent, and everything sent after the rejected notification was dropped by the gateway and needs to be re-sent.
In the end, the design of APNs means that there are some problems we simply cannot avoid. Notifications winding up in an unknown state after a connection dies is one of them.
But! We can definitely make a best effort, and that brings us back to the issue at hand. I think the best thing you can do here is to just continue writing notifications until you want to close your connection (how and when is up to you), but keep a local copy of everything you send. Occasionally, you'll find out that something you sent a while ago got rejected, but you'll be able to re-send all of the notifications that followed.
We don't have a perfect solution to this problem, but in Pushy, we take a number of small steps to keep things as robust as possible:
Does that make sense?
I also had an issue with invalid tokens and found a solution for that.
Just like Apple said:
If you send a notification that is accepted by APNs, nothing is returned. If you send a notification that is malformed or otherwise unintelligible, APNs returns an error-response packet and closes the connection. Any notifications that you sent after the malformed notification using the same connection are discarded, and must be resent.
So after digging in the code, I found that houston already checks for an error block after writing to the socket (see https://github.com/nomad/houston/blob/master/lib/houston/client.rb#L53-L60)
As you can see the notification is mark as unsent when an error occurs. That's the point where you have to reopen your connection to be able to proceed with your batch to send the the next notifications.
Does that make sense?
UPDATE:
I noticed that the check of the read_socket is broken. If you skip this check and continue with connection.read(6)
the error handling works.
https://github.com/nomad/houston/blob/master/lib/houston/client.rb#L53-L54
@svewag you tried it with a fork? Is the error returned immediately or eventually?
No, I didnt forked.
In my project we have good experienced so far with the workaround to check after every 20th sent push notification if there is an error on the read-socket. So after the 20th sent notification, we wait 1 second and the read with connection.read(6)
to obtain a possible error message from the service. You also get the id of the failed notification.
Because we are iterating through an array of notification to be sent we can determine by the index of the failed notification which notifications has been sent successully and which not. Your next step then is to reconnect to the APN and resend the notifications in the array after the failed one, and so and so on.
please be aware, that select doesn't always detect an issue. You really need to write to the socket to fully detect an issue.
Or pause and resend. And pausing a simple second (or half) seemed to catch it most of the time, it just caused too much of a lag for us. If I remember correctly, adding 1 second every 20 messages makes the messages go through in double or tripple the time. Some other people would send a hundred messages (or till the end of the message list) and then intentionally send a bad message and read the response. If the error code was for the bad message sent, then they knew definitively that it was a success.
Best of luck.
We're evaluating
houston
after inheriting a project that heavily usesgrocer
.One of the persistent problems the project had with
grocer
was that failed notifications caused persistent connections to drop subsequent notifications, silently.Pardon me if I'm retreading old ground here, but for reference, see https://github.com/grocer/grocer/issues/14 for discussion around
grocer
and this topic. In a nutshell, the APNS architecture notifies of failures on the push socket, but not of successes, so you're required to either block the connection until the failure notification appears, or a timeout has been exceeded. The APNS feedback service can be used to determine, after the fact, which device tokens were bad, but it will not indicate which messages have failed, so re-sending subsequent failures is complex.It looks like the changes around e71b87dadc6c90eef92b463020e2e991d27f0700 address this issue by implementing the
IO.select
solution mentioned in the above issue, and implemented in a few forks (https://github.com/ping4/grocer/blob/ping4/lib/grocer/ssl_connection.rb has an example).I'm new to
IO.select
, but as far as I can tell, this;will block until either
read_socket
is ready for reading orwrite_socket
is ready for writing. So it's possible thatIO.select
will return before the APNS has pushed an error code, indicating a false positive. In informal testing, this seems to be exactly what happens:The error never appears:
IO.select
returned becausewrite_socket
became ready for writing, not because read_socket became ready for reading.Without the second argument,
IO.select
will block (forever) until theread_socket
is ready with an error to read. It does, however, reliably contain error messages for failed device_tokens.Without the second argument, and some small-ish value for
timeout
(the fourth arg) accomplishes both goals, at the expense of waiting fortimeout
seconds to pass before the next message can send:The timeout tax might work in certain situations, so I'm not willing to discard it.
That said, I'm already out of my depth. My limited research indicates that this is the accepted solution area, but
grocer
has mostly abandoned this idea, and nobody is happy about the blocking. Is there a solution to this that doesn't involve opening and closing connections repeatedly?