shamblett / mqtt_client

A server and browser based MQTT client for dart
Other
552 stars 179 forks source link

Issue when connection is lost - status stays as connected #299

Closed wagner-rebello closed 3 years ago

wagner-rebello commented 3 years ago

I'm using a list of connections to connect to many brokers in my local network (Each Broker is a device). When i unplug power from one device (to simulate a connection loss) none of my callbacks run and the connection status stays connected for that broker/device. Using wireshark i can see that ping never got a response. Screenshot from 2021-07-15 17-23-03 When i turn on the device again ALL the callbacks run at the same time that the connection is reestablished. (I have a timer that checks connection status, the status turn to disconnected at that moment)

As the numbers in the image below: log

  1. Was a ping that have response. Between 1. and 2. the selected ping in wireshark picture happens.
  2. is 5 minutes later when i powered on the device again. And ALL the callbacks run at the same time.

Codes for creating the connections:

        clients[clients.length - 1].mqtt.autoReconnect = false;
        clients[clients.length - 1].mqtt.keepAlivePeriod = KEEPALIVE; //30 segundos de Ping.
        clients[clients.length - 1].mqtt.onDisconnected = () => _onDisconnected(qlEh); //Ao desconectar chama esse metodo.
        clients[clients.length - 1].mqtt.onConnected = () => _onConnected(qlEh); //Ao conectar chama esse metodo.
        clients[clients.length - 1].mqtt.onAutoReconnect = () => _onAutoReconnect(qlEh);
        clients[clients.length - 1].mqtt.onSubscribed = (String cb) => _onSubscribed(cb, qlEh); //Ao inscrever-se com sucesso chama esse metodo.
        clients[clients.length - 1].mqtt.onSubscribeFail = (String cb) => _onSubFail(cb, qlEh);
        clients[clients.length - 1].mqtt.onUnsubscribed = (String cb) => _onUnSubscribed(cb, qlEh);
        clients[clients.length - 1].mqtt.pongCallback = () => _pong(qlEh); //Ao receber ping corretamente chama esse metodo.
        clients[clients.length - 1].mqtt.connectionMessage = connMess;
void _onDisconnected(int qlClient) {
    if (clients[qlClient].mqtt.connectionStatus.disconnectionOrigin == MqttDisconnectionOrigin.solicited) {
      print('($qlClient - ${clients[qlClient].id}) Solicitada desconexão do Broker');
    } else {
      _limpaTopicos(qlClient);
      print('($qlClient - ${clients[qlClient].id}) Conexão caiu');
      //TODO: Rodar broadcast novamente???
    }
  }

void _pong(int qlClient) {
    print('($qlClient - ${clients[qlClient].id}) MQTT Ping/Pong [${DateTime.now()}]');
}

void _onConnected(int qlClient) {
    print('($qlClient - ${clients[qlClient].id}) Conectado ao Broker');
    //TODO: Roda subs?
    for (var e in subbedTopics) {
      // if (clients[qlClient].mqtt.getSubscriptionsStatus(e) == MqttSubscriptionStatus.doesNotExist) {
      //FIXME: Vai rodar varios subscribes mesmo sem necessidade pois não vai estar inscrito nos topicos que não deveria.
      subscribe(e, 1);
      // }
      // subscribe(e, 1);
    }
  }

Turning autoReconnect to true doesn't help.

shamblett commented 3 years ago

OK, there are 2 issues here, the client has never taken any action if the broker doesn't respond to its ping request if the connection status is connected, it doesn't disconnect for example. However, just because no one has asked for this thus far it doesn't mean we can't do this, I can add a setting to to keep alive processing that will disconnect if pings are not responded to, that way users can select the behaviour they want.

The second issue, as in why your callbacks are called when you turn your devices on etc. is nothing to do with the client. The client uses the Dart runtime to interface to the network stack, if the runtime says a socket is connected then the client assumes it is, when the runtime says a socket is disconnected then again it takes appropriate action dependant on user settings. If you want to know why/how the Dart runtime decides these things then I suggest you bring this up with the Dart Google guys.

wagner-rebello commented 3 years ago

Using wireshark and filtering for TCP i can see that TCP retransmission keep happening until the device is turned on when the connection is reseted. A setting to disconnect if keep alive don't respond would be cool to solve that problem.

shamblett commented 3 years ago

Yes, I'll update the client to allow the user to select whether a broker failing to respond to a ping will cause a disconnect or not and chose the time period they want to wait.

shamblett commented 3 years ago

Client updated to add the ping request response fail functionality, package re published at version 9.4.0.

See the API docs and the example mqtt_server_client_keep_alive_disconnect.dart for usage.

shamblett commented 3 years ago

Now at 9.4.2

wagner-rebello commented 3 years ago

When the connection is lost with the new keepalive disconnection option the MQTT status correctly enter into disconnected but the socket stay opened waiting for a RST packet from server. Sometimes this leaves to a state that a new connection replaces the variable "client" in file \connectionhandling\mqtt_client_mqtt_connection_base.dart leaving the old socket in a unknown state and closing the new created connection with RST packet from the old sock. This sometimes put the connection in a loop where it creates a new connection overwrite the old client reference with a new reference, receive a RST or FIN from the old connection closes the new and go on.

I think that it's better to close the TCP socket when the keep alive fails with the option "enabled".

wagner-rebello commented 3 years ago

After some diggin' I found out that this chunk of code connectionhandling\server\mqtt_client_synchronous_mqtt_server_connection_handler.dart image

Is leaving lost references to the old TCP socks, when the new connection is created the "MqttServerConnection" inheritances creates new references, if old socks listeners capture data (Like RST or FIN) the data is then redirected to the new object references.

The old socks stays de-referenced and got FIN (or RST because of TCP Retransmission) by the server by the lack of keep alive (or TCP Retransmission), the old listener put the state of connection in disconnected, my code call the connection method again that de-reference the actual connection and go on...

shamblett commented 3 years ago

I hear what you are saying, the old socket events are somehow redirected to the closures(onDone, onError etc.) of the new socket. But what can the client actually do here? Dart is a GC based language, only when GC runs are unused objects destroyed, even doing client == null probably won't force a GC at that point. All the client can do is call methods on the old socket to destroy it somehow. What do you mean by ' with the option "enabled".

wagner-rebello commented 3 years ago

"What do you mean by ' with the option "enabled". Set the property ("Option") disconnectOnNoResponsePeriod >= 1

shamblett commented 3 years ago

Ok, where is this in the Dart API? The SocketOption class only has tcpNoDelay, apologies if I'm missing something here.

wagner-rebello commented 3 years ago

Examples of the problem using wireshark. PingFail As you can see in the line Nº 287 there was a "ping req" (port 48420) from the smartphone without response (I intentionally removed my device (broker) from power). onDisconnect callback runs without problem informing my code that MQTT connection dropped. After 3 seconds my code try another connection (Nº 323)(device still offline and flutter sent retransmissions for the SYN packet).

When my device turned on again it ACKed the SYN packet and the TCP connection opened (48423 port).

After some time the old socket tryed another retransmission from the lost ping request (Nº 1461 from below image) (port 48420, first connection) RstFirst

As the method _disconnect() from file \connectionhandling\mqtt_client_mqtt_connection_base.dart never got called when there is a ping response fail, the old sock still exists (port 48420), including the old listeners. The device RSTed the connection (Nº 1464) because it was powered off 'cause the connection don't exist, that RST call the onDisconnect callback and set the connection as disconnected also Ping Req from the new connection do not happen anymore (Set to 15s) when my device stays more than 15s without ping req it ends the connection as expected and put the smartphone in this eternal loop.

wagner-rebello commented 3 years ago

"What do you mean by ' with the option "enabled". Set the property ("Option") disconnectOnNoResponsePeriod >= 1

This is the new option that you created to "disconnect" when there is no ping respones after some time.

shamblett commented 3 years ago

Ah OK, its not a socket option, so back to my original point, how do you suggest the client 'disposes' of old references to its sockets? I can't think of a way to do this.

shamblett commented 3 years ago

BTW issue #302 shows the disconnectOnNoResponsePeriod opytion working for the use case the OP had, so it does work in some, hopefully most cases.

wagner-rebello commented 3 years ago

Ah OK, its not a socket option, so back to my original point, how do you suggest the client 'disposes' of old references to its sockets? I can't think of a way to do this.

Maybe some chunk of code that can be called on internalConnect method that verify existance of connection variable before re-instantiation and some public method that destroy the socks.

I'm creating a clean project to test how the TCP listener behave after destroying sockets.

shamblett commented 3 years ago

Yes, destroy may be an option here, depending on how it works, we dont want destroy to call one of the closures like onDone etc. Your work here is interesting, please post back your results, thanks.

wagner-rebello commented 3 years ago

After the tests i can confirm. Calling destroy on the old sock (client variable in your lib) stop all callbacks from listen. onData, onError listeners from the old sock are destroyed in conjunction. But the onDone is called when destroying the old client.

wagner-rebello commented 3 years ago

You can cancel the listener before destroying the socket, this way none of the callbacks run. image

shamblett commented 3 years ago

Yes I remember now, I actually use this mechanism in the browser ws comnnection -

...
 closeEvents = client.onClose.listen((e) {
        MqttLogger.log(
            'MqttBrowserWsConnection::connect - websocket is closed');
        closeEvents.cancel();
        errorEvents.cancel();
        return completer.complete(MqttClientConnectionStatus());
      });
      errorEvents = client.onError.listen((e) {
        MqttLogger.log(
            'MqttBrowserWsConnection::connect - websocket has erred');
        closeEvents.cancel();
        errorEvents.cancel();
        return completer.complete(MqttClientConnectionStatus());
      });
...

Why it never occurred to me to apply this to the rest of the connection handlers I don't know.

I'll update the package to add this functionality.

shamblett commented 3 years ago

OK, I've updated the client with the fix above, I've not re published this yet as we may find more problems here.

If you would point your pubspec.yaml to the git repo and set the branch to 'issue299-1' and carry on with your testing that would be great.

shamblett commented 3 years ago

I've found a small bug when using disconnect on no ping response with auto reconnect and fixed it, if you are currently testing with this could you please do a pub upgrade, thanks.

wagner-rebello commented 3 years ago

Sure, i'll test.

wagner-rebello commented 3 years ago

It keeps entering in loop when the connection is lost by keep alive.

I attached some wireshark packet capture.

PacketCapture.zip

I created a fork in my personal account and did some changes on 9.4.2, in my fork after receiving the RST packets from the OLD connection the new stays opened.

https://github.com/ZantsuRocks/mqtt_client/commit/8db8e3bc0866d2d0b3de4ee2f20cf5fb4e7cc589

I didn't really understand the structure of your code, so I didn't create a pull request. My code is not 100% implemented either, but it serves as a basis perhaps for a fix.

shamblett commented 3 years ago

Ok, could you just confirm when you do a pub upgrade that the branch it prints is 'issue-299-1' with a commit of 'a42065814a8adf08e8b49b9443c1b06649b1df63'.

wagner-rebello commented 3 years ago

Ok, could you just confirm when you do a pub upgrade that the branch it prints is 'issue-299-1' with a commit of 'a42065814a8adf08e8b49b9443c1b06649b1df63'.

Yes image

wagner-rebello commented 3 years ago

I'm trying to replicate the issue again using your branch to tell you the exact steps that create the problem. But now the problem isn't happening again.

wagner-rebello commented 3 years ago

Gotcha. image

The problem is with the timing between the ping req and the ping resp

Ping response delayed something about 120ms to arrive and the connection died for your handlers. This created a duplicate connection and the old still open.

shamblett commented 3 years ago

OK, I'm not really following you, what do you mean by ' the connection died for your handlers'? What is causing this, do you have a log?

When you say 'This created a duplicate connection and the old still open.', we now cancel the listener and destroy the socket so how does this happen?

I'll put some more logging around the destroy method to see exactly when it is called.

wagner-rebello commented 3 years ago

Put some log inside _disconnect() in the MqttConnectionBase class. You'll see that when keep alive fails (There is no Ping Response to a Ping Request) that method never runs. At least for me this is happening.

I'm put some log in MqttConnectionBase class in the method _disconnect() (line 80) https://gist.github.com/wagner-rebello/e906a6af8b9aded8ac3346907173ff9e

This is what happens to my log (the red arrow is the onDisconnect callback) image

As you can see the log on MqttConnectionBase never runs leaving the sock listener and connection "open" but turning your status (MqttClientConnectionStatus) disconnected.

As the connection may still be open TCP Retransmission keep ocurring the old listeners still exists too, wich kills the new connection.

wagner-rebello commented 3 years ago

Maybe this log helps more image

That Green Arrow put the MqttClientConnectionStatus again in MqttConnectionState.disconnected even that the new socket still exists, and this is where the reconnection loop go on forever.

wagner-rebello commented 3 years ago

Pull Request #308 fixes the connection loop for me. The old socket stay alive when there is a ping fail but the listeners references at least will be destroyed.

shamblett commented 3 years ago

Thanks for this, merged the change, please pub upgrade and re test, thanks.

shamblett commented 3 years ago

I've added a small update, hopefully it should now close the socket before we re connect, please pub upgrade and see what you get now.

wagner-rebello commented 3 years ago

Sure, i'll test.

wagner-rebello commented 3 years ago

Changes were tested by 3 people together with me and we didn't find any more problems.

shamblett commented 3 years ago

Thanks, I've merged the branch, finished my own testing and re published the package at version 9.5.0.

Thanks for the testing you have done here, I know how time consuming this can be, this helps you of course in your use case but the fixes identified in this issue help all the users which is great.

I'll close this issue here, its got rather lengthy, if you find anything else please raise a separate issue, we can always refer to this one if needed.