nats-io / stan.go

NATS Streaming System
https://nats.io
Apache License 2.0
706 stars 117 forks source link

A durable subscription that times out in the client may subsequently fail with "stan: duplicate durable registration" #331

Closed ghost closed 3 years ago

ghost commented 3 years ago

Using v0.11.0, if I call Subscribe with a durable name, and the call times out, subsequent calls may all return the error "stan: duplicate durable registration." Presumably this means the subscription exists on the server, but not in the client. I can work around this by raising the value in "ConnectWait," but it seems like the clean up that happens in the client upon timeout should remove the subject from the server, so that further calls will be successful.

kozlovic commented 3 years ago

Unfortunately the library can't do anything. If you get a timeout, it means that the library did not receive a response from the server. So it cannot close the subscription since it did not receive an "ack inbox" from the server that is used to identify that subscription. We would need to change both server and clients so that the client provides the ack inbox (instead of the server assigning it) so that the library could try to close the subscription on the request timeout.

ghost commented 3 years ago

Makes sense. I see in the code for a subscription error, an inbox unsubscribe is called. My first concern is that for a durable connection, we want a close, and not an unsubscribe. In any case, is it possible for that request to tell the server to remove the subscription, even if we're not certain it was created in the first place? It looks like that was the intent, but for this case, it's not happening.

kozlovic commented 3 years ago

No, the unsubscribe is for the low level subscription on an inbox to which the library will receive data for the streaming subscription.

The library, as it stands, as no way to tell the server to close the durable subscription because the unsubscribe/close request is based on an ackInbox that is created by the server when processing the subscription request. So again, on request timeout, the library did not get that ackInbox, so will have no way to tell the server to close the durable subscription identified by that ackInbox.

I am not saying that this can't be solved in general, I am saying it can't be solved with current protocols between the client and server. It would require a protocol change along with client(s) and server code change.

ghost commented 3 years ago

Thanks for the quick feedback and useful insights.

kozlovic commented 3 years ago

Not sure if this is related to https://github.com/nats-io/nats-streaming-server/issues/1135, but if it is, I am proposing a new option in the server that would "close" (in the server only) the current durable subscription and accept the new one. See https://github.com/nats-io/nats-streaming-server/pull/1136

ghost commented 3 years ago

Looks like the same issue, and a viable solution.

kozlovic commented 3 years ago

@adklager If you can build from the PR's branch and test for yourself, and comment in the PR, that will be great. Otherwise, no worries.

kozlovic commented 3 years ago

Closing since the PR on the server repo has been merged.