Closed ArmorDarks closed 11 months ago
If your code is exactly as you have it there, then I know why it is happening - you are returning from the status iterator, which means that it won't report anything else - simply continue
the loop - you cannot return or break from it. Please feel free to reopen if that is not the case.
Nice catch, Alberto! That also explains why it stopped working after a while - the ping was sent after some time and that ended iterator. Thank you!
you were haunted by a callback model :)
Observed behavior
Statuses iterator is not updating even if the server is going to LDM or just being shut down.
Expected behavior
The connection status should change
Server and client version
Tested on servers v2.8.3, v2.8.4, v2.9.6 Tested on client v2.4.0, v2.18.0
Host environment
No response
Steps to reproduce
So far, I haven't figured out a reliable reproduction.
What I noticed is that the issue is not happening if the client was started recently.
For instance, if a service client was just started and I kill the NATS server locally, it does trigger status updates, but if the service with a client were up for a few days, it would stop logging any status updates. Killing the NATS server will cause zero logs in that case. I'm not sure, though, if it's indeed related to the NATS client itself or maybe it's something due to a server that has been up locally in my case for days.
That issue happened for me locally both on server v2.8.3 and latest v2.9.6 (alpine image).
It's a similar situation in our production. If the service is running for quite some time, it will stop logging any status updates even if NATS server pads were restarted. We can observe that Rust-based services are properly detecting connection status changes, but Node-based services do not react in any way.
Example of NATS client initialization:
There are multiple logs here. None of them were logged or added to metrics between the day it was deployed on prod and until now, even when NATS servers were restarted.
We know, though, that during service updates,
NATS connection closed
was logged, so that meansconnection.closed()
is still working.The issue started to happen somewhere from the moment we upgraded the NATS client to v2.4.0 many months ago, but thi also co-incedences with an update where we introduced settings like
waitOnFirstConnect: true
andignoreAuthErrorAbort: true
, and increased reconnection jitter.Could it be that something is locked in the NATS client queue that handles statuses iterator?