slact / nchan

Fast, horizontally scalable, multiprocess pub/sub queuing server and proxy for HTTP, long-polling, Websockets and EventSource (SSE), powered by Nginx.
https://nchan.io/
Other
2.99k stars 292 forks source link

nchan stops delivering messages #640

Closed daviddeutsch closed 1 year ago

daviddeutsch commented 2 years ago

I've been chasing this bug for months now and it's quite elusive, but, honestly I'm not even sure it's an nchan bug or whether I'm "holding it wrong".

I have a daemon running on my server that reads and writes to a websocket and those messages arrive at the client (js in browser) just fine. It seems, though, following some high traffic events where the daemon writes a lot into the websocket, it no longer accepts messages from the daemon.

The strange thing is: I can still receive messages just fine. Same thing for the client: all messages sent in get an echo and so forth.

I can see in the daemon that it has some kind of break in connection and then re-establishes the connection, but now it seems to be one-way only. Of course, I'm establishing the connection the exact same way every time, so it feels, strangely, like some kind of penalty session where a bad behaving client is no longer allowed to send but can still receive.

Again - it might just be my daemon doing something strange when writing to the connection, so ANY pointer for how I could debug this would be incredibly helpful.

I'm already tracking how often I write into the socket and how much data and so forth, but no reliable pattern has emerged.


This might be a helpful addition: I DID have, for a while, an issue where I could also no longer properly READ from the websocket in my daemon, too. It turned out that this was due to the previous read stopping mid-message and thus garbling all subsequent reads because they would, in a strange way, read the broken tail end from the previous message as the start of the new message.

Perhaps this is the same issue but the other way around, that the message my daemon (for what ever reason) sends into the websocket gets cut off and, going forward, all subsequent messages that arrive in nchan are corrupted like this?

daviddeutsch commented 1 year ago

@slact Three weeks and not a single incident - I'm going to close this. Thanks for the fix!