We're trying out your gem to send VoIP notifications, using Sidekiq. We are having some issues though with broken connections.
At first we were raising an error in the connection.on(:error) {} callback, like this:
Apnotic::ConnectionPool.new(connection_config, size: 5) do |connection|
connection.on(:error) do |exception|
raise(PushNotification::Error, "Production APNs connection error: #{exception}")
end
end
That was a really bad idea since it crashed all of Sidekiq making it restart. We fixed this and now we're just reporting to our error service instead.
Apnotic::ConnectionPool.new(connection_config, size: 5) do |connection|
connection.on(:error) do |exception|
Sentry.capture_exception(exception)
end
end
Now, occasionally we get this error reported:
Errno::ECONNRESET: Connection reset by peer
from openssl (3.2.0) lib/openssl/buffering.rb:211:in `sysread_nonblock'
from openssl (3.2.0) lib/openssl/buffering.rb:211:in `read_nonblock'
from net-http2 (0.18.5) lib/net-http2/client.rb:145:in `block in socket_loop'
from net-http2 (0.18.5) lib/net-http2/client.rb:142:in `loop'
from net-http2 (0.18.5) lib/net-http2/client.rb:142:in `socket_loop'
from net-http2 (0.18.5) lib/net-http2/client.rb:114:in `block (2 levels) in ensure_open'
It's reported in the callback and then 60s later we get a timeout here:
connection_pool(ios_voip_push_token).with do |connection|
response = connection.push(apnotic_notification(notification, ios_voip_push_token))
raise(TimeoutError) if response.nil?
[...]
end
I guess we can pass a shorter timeout to the push method to lower this timeout, since it seems fairly high.
Anyway, when this happened it started happening a lot. Almost all our pushes got this connection reset error. Our push jobs are not retried, but I don't think this would help either since the connections seems to not be "healed".
Could there be an issue where connections are stuck in a broken state? Or are we supposed to handle these errors differently?
Hi!
We're trying out your gem to send VoIP notifications, using Sidekiq. We are having some issues though with broken connections.
At first we were raising an error in the
connection.on(:error) {}
callback, like this:That was a really bad idea since it crashed all of Sidekiq making it restart. We fixed this and now we're just reporting to our error service instead.
Now, occasionally we get this error reported:
It's reported in the callback and then 60s later we get a timeout here:
I guess we can pass a shorter timeout to the
push
method to lower this timeout, since it seems fairly high.Anyway, when this happened it started happening a lot. Almost all our pushes got this connection reset error. Our push jobs are not retried, but I don't think this would help either since the connections seems to not be "healed".
Could there be an issue where connections are stuck in a broken state? Or are we supposed to handle these errors differently?