Closed r-delgadillo closed 10 months ago
Thanks for filling the issue!
Ordered Pull Consumer should send new fetch requests to the server when it sees reconnect event, and Consumer is not deleted because server restarted. It probably is deleted because until server is up and running again, the InactiveThreshold
is reached.
Increasing InactiveThreshold
to ~30 seconds would be a good test to check if that's the culprit.
I'm not sure about recreating ordered consumer if it's not there - forcing recreation would make InactiveThreshold
config obsolete and would also not allow operators of the cluster to permanently remove specific consuer, as Client would immediately recreate it.
We will take a look at the code.
@Jarema I attempted to increase the InactiveThreshold
to 1 Hour and still no luck. It seems as Consumers are being deleted immediately and not honoring the InactiveThreshold
on a hard restart. Below is a screenshot of my Consumers from the NATs box:
The reset
method in ./jetstream/ordered.go already does recreation of Consumers for certain errors. So it seems like the config is already obsolete to some extent?
Thank you for looking into this.
Forgot to mention, OrderedConsumers
allow the caller to set the MaxResetAttempts
-which recreates the Consumer. It seems like recreating the Consumer is by design. To your point, it feels like InactiveThreshold
kind of contridicts this setting.
@piotrpio rightfully pointed out, that the OrderedConsuemer
cannot work after server
restart, as it uses memory storage
on the server. Because of that, server restart wipes Consumer's state.
We will address that by recreating Consumer
on Missing Heartbeat
event (as not every reconnection means server restart).
Can you elaborate why you think InactiveThreshold
contradicts MaxResetAttempts
?
Nice, good find.
Regarding InactiveThreshold
& MaxResetAttempts
, I was carrying on thinking from a comment you made, "forcing recreation would make InactiveThreshold
config obsolete." Thinking about it a bit more, I think recreating the Consumer in certain scenarios - such as missing Heartbeat makes sense. Recreating the Consumer if InactiveThreshold
is reached does not make sense - which I believe is what you were saying earlier.
I'm curious how release trains look for nats.go. Do they occur on a certain schedule or are they more on-demand? I'm wondering what the timeframe for the fix looks like.
@Jarema friendly ping to see if you have any insights on the fix release train. Thanks
@r-delgadillo sorry for the delayed response. Releases are done on-demand, but we usually try to do them monthly at least, so we are a bit overdue! I aim to publish the next release next week, I'll make sure to include resetting consumers on missing heartbeats and not resetting them after inactive threshold is reached.
I was also having this problem with both the new and legacy (push and pull) jetsream API . In the OrderedConsumer
case I sometimes didn't get any error arriving at my ConsumeErrHandlerFunc
which corresponded with consumer not resetting and therefore silently failing to receive messages. Most of the time I did get an error and the Consumer would reset and pick up receiving messages.
@piotrpio is the fix for this in the 2.9..22 release?
@bdejong this is a fix in the client, not server, so not yet available. Sorry for the delay, release will hopefully be out soon.
@piotrpio any updates on this bug, thanks
@r-delgadillo Again, sorry for the radio silence. I created a PR which should solve this issue: https://github.com/nats-io/nats.go/pull/1425. Sorry it did not make it to the previous release.
@piotrpio all good, thank you for the update. Looking forward to have this during the next client release.
@piotrpio have any updates been made on the PR? Thanks
Sorry for not updating this issue... Fix has been released in v1.31.0
Defect
JetStream OrderedConsumer fails to reconnect upon hard reset on NATs server. Looking at the GoLang package here, it feels like we should add a conditional check if the err received from the server is "Consumer not found" err so we can perform a "reset" on the client which then recreates the Consumer for processing.
Have a Slack channel going here.
Make sure that these boxes are checked before submitting your issue -- thank you!
Versions of
nats.go
and thenats-server
if one was involved:nats.go: 1.28.0 nats-server: 2.9.21
OS/Container environment:
Kubernetes Minikube Cluster. GoLang Linux
Steps or code to reproduce the issue:
Expected result:
Expect OrderedConsumer to recreate the Consumer and continue processing messages where it had left off before the NATs server restarted.
Actual result:
Consumer fails to "reset" and fails to continue processing messages.