nsqio / go-nsq

The official Go package for NSQ
MIT License
2.59k stars 444 forks source link

Disconnects seem to be quite ungraceful #342

Open karalabe opened 2 years ago

karalabe commented 2 years ago

Been playing around with NSQ a lot lately and I keep hitting walls when trying to write test suites for assembling various network topologies. Most of the issues seems to stem from NSQD not handling properly consumer disconnects (I'm using go-nsq). I don't even know where to describe the strange things:

Seems to me that the entire shutdown pathway is very very wrong, just that various timeouts hack around the root cause. E.g. the client heartbeats (or lack thereof after a disconnect) is the one that will trigger the cleanup of leftover client counts; the in-flight timeout is the one that reschedules messages nor processed by a disconnected client.

I'm unsure if I'm doing something weird here, but it seems that NSQ is very very prone to weird behavior when I have very short lived connections.

ploxiln commented 2 years ago

Honestly, we haven't historically worried much about clean client disconnects, and our tests don't pay attention to that in particular (just that messages go where they should go). We have a few existing issues about noisy logs related to as-clean-as-currently-possible disconnects ...

https://github.com/nsqio/nsq/issues/521 https://github.com/nsqio/nsq/pull/582 https://github.com/nsqio/go-nsq/issues/103

it seems that NSQ is very very prone to weird behavior when I have very short lived connections

That is plausible, it was not designed for short lived tcp-protocol connections. But if you can offer some good fixes/cleanups for these cases, that would be great :)

mreiferson commented 2 years ago

I'm inclined to move this issue over to go-nsq as I think it is likely the major contributing factor here.