Open thisisjeffwong opened 3 years ago
Hi @thisisjeffwong,
This does sound like a clock issue. I've made some changes to insulate apnotic from system time. Would you be up for trying out the monotonic branch to see if that helps?
Out of curiosity, what do you see in production when running Time.now
?
Thanks!
I could try it on my dev APN server and see if it still works. Have you had a chance to test this code against any sends to APNS?
For testing in production, would you be looking for evidence that this fixes the problem, or evidence that this fix does no harm?
We only get this bug every few weeks but we've gone as long as 4 months without a problem. If it's a relatively safe fix and others approve, we could incorporate it and just reopen this issue if we see it again.
@benubois For Time.now
on Rails Console, I see 2021-11-03 10:55:28.748582635 -0700
, which looks normal.
The time drift is a pretty good explanation for why this is only happening to one server. However, the failures seen are interspersed with successes. Could the time used for the token only be off with respect to certain servers on Apple's cluster? Or is apnotic regenerating the token once a failure is detected.
I've done some more testing and now I'm not so sure this is time related.
An expired token actually results in a 403 ExpiredProviderToken
error.
However an invalid team_id
, or key_id
results in 403 InvalidProviderToken
.
Try logging the token when you get a 403 error and make sure it includes all the required parts.
Have you had a chance to test this code against any sends to APNS?
Yes, it works. I'm running the branch in production.
For testing in production, would you be looking for evidence that this fixes the problem, or evidence that this fix does no harm?
Evidence that it fixes the problem.
A teammate was surprised that monotonic time would fix an issue of time discrepancy between servers since servers don't share a monotonic clock.
I referred to this explanation: https://blog.dnsimple.com/2018/03/elapsed-time-with-ruby-the-right-way/
Is this close to the reasoning that motivated your change?
Yes, the monotonic change is just about preventing time related issues only on your server. I don't think this is about a time discrepancy between servers.
What's the exact length of time that the error persists?
It lasted only 25 minutes before disappearing. Only a tiny fraction of that server's sends to APNS errored out during that time.
@thisisjeffwong my app is also getting sporadic outbreak of InvalidProviderToken
, the monotonic change doesn't fix it. Have you find the root cause/mitigation yet?
No, I no longer work at the place where I had the issue. Unfortunately, i missed the chance to test this theory.
On Tue, Oct 22, 2024 at 11:49 PM Tong Pan @.***> wrote:
@thisisjeffwong https://github.com/thisisjeffwong my app is also getting sporadic outbreak of InvalidProviderToken, the monotonic change doesn't fix it. Have you find the root cause/mitigation yet?
— Reply to this email directly, view it on GitHub https://github.com/ostinelli/apnotic/issues/109#issuecomment-2431064278, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH5JXN2PLQTYOYQGWGBY33Z45BGJAVCNFSM6AAAAABQOCGVACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZRGA3DIMRXHA . You are receiving this because you were mentioned.Message ID: @.***>
We have multiple APN servers sending APNs to Apple. We occasionally get a sporadic outbreak of
InvalidProviderToken
errors from Apple that lasts for less than an hour and recovers itself. We aren't doing anything at the application level in response to the 403 errors other than reporting the errors.Has anyone else experienced this?
I thought that maybe one of the servers on Apple's APN cluster might have a clock skew but that would theoretically affect all of our servers equally.