nats-io / stan.go

NATS Streaming System
https://nats.io
Apache License 2.0
706 stars 117 forks source link

didn't find a way to reconnect. Please help me ! #333

Closed artem-webdev closed 3 years ago

artem-webdev commented 3 years ago

there is such a code in a real project clientId cannot be changed !!! how can I reconnect if I get in response - stan: clientID already registered ?

package main

import (
    "github.com/nats-io/stan.go"
    "log"
)

const (
    StanServers   = "host-1:4222,host-2:4222,host-3:4222"
    StanClusterId = "queue-messages"
)

func GetConnectStan(clientId string) (stan.Conn, error) {

    conn, err := stan.Connect(
        StanClusterId,
        clientId,
        stan.NatsURL(StanServers),
        stan.Pings(5, 10),
        stan.SetConnectionLostHandler(func(_ stan.Conn, reason error) {
            log.Fatal(reason.Error())

        }),
    )

    return conn, err

}

type ServiceProxy struct {
    NatsConn stan.Conn
}

func main() {

    proxy := ServiceProxy{}
    var err error
    proxy.NatsConn, err = GetConnectStan("test-client-1")
    if err != nil {
        log.Fatal(err.Error(), proxy.NatsConn)
    }

    //for some unknown reason, the pointer became nil
    proxy.NatsConn = nil

        // return fail stan: clientID already registered
    if proxy.NatsConn == nil {
        proxy.NatsConn, err = GetConnectStan("test-client-1")
        if err != nil {
            log.Fatal(err.Error(), proxy.NatsConn)
        }
    }

}

you closed this issue without waiting for my comments - https://github.com/nats-io/stan.go/issues/332

I read the documentation !

I thought that by the code you will understand what the problem is

in this code I presented a variant of a client proxy, for some reason the pointer to connections became nil

in this case, there is no way to reconnect at all! I showed you a working situation where there is no explicit call to close

For example, can I ask the server to kill the connection with this identifier?

kozlovic commented 3 years ago

you closed this issue without waiting for my comments - #332

I did, because there was no doubt on what the issue is and there is nothing that we can do in the library or the streaming server as it stands.

I thought that by the code you will understand what the problem is

I perfectly understood what the problem is: it is an issue with your proxy that loses a reference to a valid connection. There is nothing the streaming library can do about that.

in this code I presented a variant of a client proxy, for some reason the pointer to connections became nil

Understood, but again, the connection is still valid and therefore the server will correctly reject a duplicate client ID.

I showed you a working situation where there is no explicit call to close

The problem is not that you don't close the connection, the problem is that the connection is still valid. Like I explained in the other issue you opened, things would work ok if the process had exited, even without calling close. When the server receives a request from a client to connect and detects that it is a duplicate client ID, it sends a request to the known "old" client's inbox, and since in this case, it receives a response, it considers that the "old" connection is very much alive (which is the case) and therefore rejects the new one.

For example, can I ask the server to kill the connection with this identifier?

No. The behavior is as I described above, meaning that the new one is rejected if the old one is responding. There is no option/configuration as of now to ask the server to do what you want.

I can leave the issue opened, but the point was that there is no immediate action that we could do to solve the issue you are having with your proxy losing the handle to the streaming connection.

artem-webdev commented 3 years ago

Thanks for the answer ! the thing is, the code just demonstrates the problem, in fact there are more than 1000 proxy clients in the real project and I cannot stop all applications due to multiple null pointers.

I perfectly understand many decisions in the architecture of the nats streaming server. Agree, the network is full of surprises and I don't understand why you didn't implement manual reconnection by the client

kozlovic commented 3 years ago

I still believe that you should try to fix the project so that you are not losing references to the streaming connection objects. That being said, depending on what you do with the connection, if say you simply publish, then you could be using a different client ID (randomly generated). This would not then be an issue.

But I stress that the connection that you have lost the handle to is still valid, which means that it will send pings to the server and server will still send heartbeats to them. If they have subscriptions, the server will still deliver (or redeliver) messages to them.

Agree, the network is full of surprises and I don't understand why you didn't implement manual reconnection by the client

Not sure what you mean here? The fact that there is no option to close the old and accept the new one?

artem-webdev commented 3 years ago

Not sure what you mean here? The fact that there is no option to close the old and accept the new one?

yes, but without losing stream state

kozlovic commented 3 years ago

yes, but without losing stream state

Again, I have to disagree with you. You are creating a new connection (with the same client ID) while the connection is still valid. You did not answer if the connection are creating subscriptions or not, but again, the connection is valid and messages can possibly be flowing. The current behavior of the server to reject the new client is safer. Should you have an option to "close" the old (which by the way is not possible because there is no TCP connection from the client to the "server" (see https://docs.nats.io/nats-streaming-concepts/relation-to-nats for more details)) and accept the "new" client, it would not be possible to keep the stream state.

Note that the library does not auto-reconnect and essentially because of subscriptions: this link will give you more details on why the library can't do that automatically (agreed that this part is a shortcoming of the streaming server/client original design).

artem-webdev commented 3 years ago

I am using this strategy - https://github.com/nats-io/stan.go#durable-subscriptions as my tests showed, it is the most convenient and thanks for its implementation!

kozlovic commented 3 years ago

Yes, but again, the issue here is not if the library could auto-reconnect or not, as I emphasized many times, it is your proxy that decides to create the streaming connection with the same client ID (because it lost the reference to it). So suppose that the original connection had a durable subscription and 100 messages were delivered to the library and messages are dispatched (one at a time). So there are now 99 messages in the library waiting to be dispatched.

Your code decides to create a streaming connection with the same client ID and assuming that the server had an option to accept the new one (and ignore the old one), and your application creates the same durable subscription, the server will deliver 100 messages to your durable subscription, which means that you have now an app that has two callbacks that will dispatch the same messages.

When the library will send a PING from the old client to the server, the server would respond with an error saying that that connection has been replaced, but until that time, there would be a weird time where same messages could be processed concurrently..

artem-webdev commented 3 years ago

can a small Connection pooling smooth the corners?

kozlovic commented 3 years ago

can a small Connection pooling smooth the corners?

Sorry, not sure what you mean. Could you elaborate?

artem-webdev commented 3 years ago

I'm talking about a spare connection at the tcp / ip level in nats streaming server

kozlovic commented 3 years ago

I'm talking about a spare connection at the tcp / ip level in nats streaming server

What do you mean? Could you please elaborate more? I already mentioned that the NATS Streaming "server" is not a server per se. Clients are not creating TCP connections to the "server". Check the link I have previously sent.

artem-webdev commented 3 years ago

sorry haven't read it yet

artem-webdev commented 3 years ago

according to the idea of ACK in a manual subscriber should not fail to deliver messages when reconnecting?

artem-webdev commented 3 years ago

sorry, I use google-translate, as I understand it translates terribly (

artem-webdev commented 3 years ago

anyway i realized that this behavior cannot be fixed, thanks for the quick reply!

kozlovic commented 3 years ago

You are welcome and I am sorry that the lib/server cannot address the issue you are facing.

Let me know if this is ok to close the issue now or you'd rather keep it opened a bit longer. Thanks!

artem-webdev commented 3 years ago

thanks again ! yes close, a little later I will have the opportunity to study the source code, maybe I can offer an alternative solution to this problem