Closed brlo closed 3 years ago
Please see if latest 1.6.0
solves your issue.
We appear to be having this exact issue as well, with this combination (yes, 1.6.0):
apnotic (1.6.0)
connection_pool (~> 2)
net-http2 (>= 0.18, < 2)
Correction: on further investigation, it seems our batch sometimes gets stuck waiting forever on .join of the items that have been pushed by push_async:
.../vendor/ruby/2.3.0/gems/net-http2-0.18.2/lib/net-http2/client.rb:59:in `sleep'
.../vendor/ruby/2.3.0/gems/net-http2-0.18.2/lib/net-http2/client.rb:59:in `join'
.../vendor/ruby/2.3.0/gems/apnotic-1.6.0/lib/apnotic/connection.rb:76:in `join'
are where the processes are when I send TERM I'll move this to a separate topic... sorry for noise,
Please see if latest
1.6.0
solves your issue.
Thank you. My issue was resolved in 1.6.0. Tested in production about 2 weeks.
Please open a new issue if you're still seeing this on 1.6.1.
Hello. I use Apnotic
push_async
in Sidekiq. The fix from this issue #68 resolve problem when Sidekiq is crushed because of exception in main thread. But from time to time one of my 10 Sidekiq workers is stuck forever with job where we need to send a push async. I've explored this problem and found what ifSocketError
happened, for example, here insocket_loop
https://github.com/ostinelli/net-http2/blob/master/lib/net-http2/client.rb#L142 (to reproduce just raise it here) next try ofpush_async
will stuck here https://github.com/ostinelli/apnotic/blob/master/lib/apnotic/connection.rb#L83streams_available?
will always befalse
becauseSocketError
resetting@client.remote_settings
to default value fromhttp-2
gem https://github.com/igrigorik/http-2/blob/master/lib/http/2/connection.rb#L11-L19 andremote_max_concurrent_streams
always return zero because of that condition https://github.com/ostinelli/apnotic/blob/master/lib/apnotic/connection.rb#L94-L98.To find this bug I've used monkey patch:
And, as workaround - this monkey patch:
So, I just changed
0
to1
. But actually I didn't understand why we use0
if we have default value fromhttp-2
gem. If it means some connection troubles and because of that we say that we can't use any concurrent connection, so maybe better raise some kind of exception.Also, important to know, what another Sidekiq workers continue to successfully use
push_async
until nextSocketError
happened in another worker and it also will stuck. If I apply monkey patch, which break loop after 5 seconds, the nextpush_async
in same worker will also trapped in this loop. So, only that worker, which raiseSocketError
goes into degraded state.useful comment: https://github.com/ostinelli/apnotic/issues/64#issuecomment-406113863