Open Soulou opened 6 years ago
@Soulou Nothing is jumping out at me. I'm grasping at straws a bit here but are you using max_attempts
(see: https://github.com/wistia/nsq-ruby/commit/5461c9844dc5daeb3d5551767ee21a8d27ce03f9). Maybe that's causing it to try to fin
the message over and over?
In terms of getting the Died from: No data from socket
over and over, is your thought that it's trying to repeatedly fin
the message and that's raising an exception in the read_loop
(https://github.com/wistia/nsq-ruby/blob/8774b0a8a06389f3664e86d1d5263807a51aca9d/lib/nsq/connection.rb#L312) which then triggers a reconnect?
Sorry I'm not more helpful here. I suppose that forking nsq-ruby and adding some more logging is the way to go. If you want to add more debugging level logging, happy to have that in the main repo too in case it's helpful to others.
Hi @bschwartz, forking and adding logging is what we've started to do. We're going to check concerning max_attempts
, but I don't think it's infinite.
Concerning the Died from
I've actually no real idea of the source of it, all I know is that the nsqd
instance is alive and healthy, and that some golang producer/consumer don't have these issues, so that's why I'm thinking there is something wrong in here, I just don't know what yet :-)
From time to time, we're experiencing some weird behavior with our NSQ consumers. It might be related to load pike, we haven't found a way to reproduce it 100% sure, but the issue appears once every week at least in our infrastructure.
So we can see here that between the execution and the
FIN
message, an error occurs concerning the NSQd connection, then theFIN
fails.It's what's happening in a loop, as after this, the message (with the same ID) is executed again by the process, and fails to
FIN
again, etc.When I look at the logging history about
<host1>
, I get this:So as you can see it's like if an old broken connection was kept in the pool.
On the
nsqd
part I have:Is the timeout coming from the consumer not responding? But whatever is the issue, I don't sure that executing a message in a loop is a good behavior.
Thanks a lot for your insights