shamblett / mqtt_client

A server and browser based MQTT client for dart
Other
548 stars 176 forks source link

Failure to reconnect to broker with autoreconnect enabled #305

Closed noah-depriest closed 3 years ago

noah-depriest commented 3 years ago

Hello

First off, thank you very much for supporting this library. Overall things have been working well and we've been to accomplish most of our messaging needs thanks to this library.

I'm not 100% sure if this is an issue with your library, but one thing that we are struggling with is being able to automatically reconnect to our broker consistently. We noticed there is support for auto-reconnect in the library and think we have everything setup correctly to be notified when reconnect is happening and when it's complete. In some cases, it seems that reconnect will be attempted, but never completes or is never successful and the client just stays disconnected. We've tried increasing the maxConnectAttempts option to a large number (e.g. 100+) to give it more time but it doesn't help. Below are some log messages were are seeing over and over again when stuck in this reconnect sequence. In particular, the log about the exception being thrown and ignored seems concerning.

I/flutter ( 2715): 1-2021-07-31 23:11:20.942487 -- SynchronousMqttServerConnectionHandler::internalConnect - initiating connection try 35, auto reconnect in progress true I/flutter ( 2715): 1-2021-07-31 23:11:20.942649 -- SynchronousMqttServerConnectionHandler::internalConnect - calling connectAuto I/flutter ( 2715): 1-2021-07-31 23:11:20.942793 -- MqttWsConnection::connectAuto - entered I/flutter ( 2715): 1-2021-07-31 23:11:20.943383 -- MqttWsConnection::connectAuto - WS URL is {url} I/flutter ( 2715): 1-2021-07-31 23:11:21.465470 -- SynchronousMqttServerConnectionHandler::internalConnect exception thrown during auto reconnect - ignoring I/flutter ( 2715): 1-2021-07-31 23:11:21.466230 -- SynchronousMqttServerConnectionHandler::internalConnect - connection complete I/flutter ( 2715): 1-2021-07-31 23:11:21.466368 -- SynchronousMqttServerConnectionHandler::internalConnect sending connect message I/flutter ( 2715): 1-2021-07-31 23:11:21.466464 -- MqttConnectionHandlerBase::sendMessage - MQTTMessage of type MqttMessageType.connect I/flutter ( 2715): Header: MessageType = MqttMessageType.connect, Duplicate = false, Retain = false, Qos = MqttQos.atMostOnce, Size = 73 I/flutter ( 2715): Connect Variable Header: ProtocolName=MQIsdp, ProtocolVersion=3, ConnectFlags=Connect Flags: Reserved1=false, CleanStart=false, WillFlag=false, WillQos=MqttQos.atMostOnce, WillRetain=false, PasswordFlag=false, UserNameFlag=false, KeepAlive=30 I/flutter ( 2715): MqttConnectPayload - client identifier is : {client-id} I/flutter ( 2715): 1-2021-07-31 23:11:21.467063 -- SynchronousMqttServerConnectionHandler::internalConnect - pre sleep, state = Connection status is connecting with return code of noneSpecified and a disconnection origin of none

Any idea what may be going on? Or any suggestions on how we can improve the reconnect stability? We've also tried not using the autoreconnect feature and instead manually call disconnect and connect on the client but also having some issues with this (still investigating).

For reference, these tests have been run on an Android device, and using AWS IoT Core's broker. For simulating disconnects I've just been toggling wifi on and off on the phone (with cellular off the whole time). I'll also note that we are also having similar issue as 302 as well.

Thanks in advance!

shamblett commented 3 years ago

The log above says you are connecting and sending a connect message -

 I/flutter ( 2715): 1-2021-07-31 23:11:21.466230 -- SynchronousMqttServerConnectionHandler::internalConnect - connection complete I/flutter ( 2715): 1-2021-07-31 23:11:21.466368 -- SynchronousMqttServerConnectionHandler::internalConnect sending connect message

The exception is annoying and only seems to occur on websocket connections, it doesn't affect auto reconnect functionality.

In general the client will react to events from the runtimes network layer(in your case flutter), auto reconnect will keep going until the runtime says the socket is connected, why this seems to not happen on some occasions I don't know. The root cause of issue #302 is the flutter runtime not passing the WiFi disconnect event at the platform level for a number of minutes after it occurred to the client, there's not much the client can do about this no matter what work around is put in place.

One thing I can say is that the flutter runtime does seem to behave differently to the Dart VM runtime in these areas, only the Google guys can give you a deeper explanation of what's really going on.

shamblett commented 3 years ago

The fix incorporated for issue #299 has now been tested on flutter as part of issue #302 and the package re published at version 9.4.2, this may help you with your stability issues.

noah-depriest commented 3 years ago

Thank you for you feedback. I will take a look at the new version and see if that helps anything.

Something I also thought of is that for our use case with AWS IoT we're using SigV4 for authentication. I'm honestly not too familiar with all the details of SigV4, but maybe its the case that some of the items included with it expire after a certain period of time, making the URL invalid. So in this case, when the reconnect is attempted it might fail and maybe we need to regenerate the URL over time to prevent that from happening? Just a thought.

shamblett commented 3 years ago

I don't use AWS I use GCP but it has a similar mechanism in that the JWT token it uses expires after 24 hours and so you have to disconnect and reconnect again with a new token.

The client doesn't support any of this, specifically auto reconnect will try and connect with your initial connect parameters, even if you don't use this you can't change the connection URL unless you destroy the client and create a new one with your new URL.

MurtuzaSrashtaSoft commented 3 years ago

Reconnect is not work properly. I m use AWS IoT

Testcase Stap 1.Connect Mqtt with the connected state. 2.turn off the wifi for 6-10 minutes. 3.turn on wifi

Still, my MQTT status is connecting.

shamblett commented 3 years ago

You may not have got a disconnect when you turned the wifi off, or it may be the bug that's just been fixed and is being tested on issue #299. Are you using keep alive, if so are you setting a disconnectOnNoResponsePeriod time? Either way If you want to me to look more at this I'll need a log.

MurtuzaSrashtaSoft commented 3 years ago

@shamblett

Thank you for a quick reply.

Yes We use Keep-Alive

    _client = MqttServerClient(_url, _clientId);
    _client.logging(on: _logging);
    _client.autoReconnect = true;
    _client.useWebSocket = true;
    _client.port = 443;
    _client.keepAlivePeriod = 43200;

Whenever the internet is off. MQTT status is always in connecting state. After the internet turn on MQTT status is not changing.it is still connecting state.

shamblett commented 3 years ago

Your not setting the disconnectOnNoResponsePeriod so you will disconnect if you fail to get a ping response. Mind you your keep alive period of 12 hours is rather large, is this an AWS thing.? The disconnectOnNoResponsePeriod won't be of much help to you here.

MurtuzaSrashtaSoft commented 3 years ago

@shamblett

Yes AWS thing. So what's the value I need to set rather than 12?

_client.keepAlivePeriod = ?;
_client.disconnectOnNoResponsePeriod = ?;

Please suggest a Question mark value. which is basically use as stander value.

shamblett commented 3 years ago

Well, to use the disconnect on ping response functionality to detect changes in Wifi status it depends on how quickly you want to detect the status change, if you need to know say after 1 minute than set _client.keepAlivePeriod to 60 and set the client.disconnectOnNoResponsePeriod to whatever time you think your broker should respond in. In my experience brokers are usually responsive here say within 1 second, so set it to 1 second(your broker may need longer), then you will trigger auto reconnect after at most 61 seconds of losing the Wifi.

For a more complete explanation of what seems to be happening on flutter when Wifi is disabled read issue #302.

Another way to do this is to handle the platform level notification yourself, disconnect and trigger auto reconnect manually.

shamblett commented 3 years ago

I can't see your last comment on the issue itself, what you are saying here is that the broker will be pinged every 5 minutes and has 5 minutes to respond, so you have a maximum wait of 10 minutes before auto reconnect is triggered. Note setting these to the same value may trigger a race condition, 5 minutes is a long time to wait for a broker response, I'd lower this if I were you.

On Mon, 9 Aug 2021 at 13:50, MurtuzaSrashtaSoft @.***> wrote:

i set both value 300 = 5 Minutes

Is it right value?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/shamblett/mqtt_client/issues/305#issuecomment-895195051, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACCACKAXEFZIATOARXGG2TT37FHXANCNFSM5BLOBGLA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

-- "Talk is cheap. Show me the code."

Steve Hamblett

MurtuzaSrashtaSoft commented 3 years ago

@shamblett

I modify my code with the below code. and it reduces reconnecting.

please verify

    _client = MqttServerClient(_url, _clientId);
    _client.logging(on: _logging);
    _client.autoReconnect = true;
    _client.useWebSocket = true;
    _client.port = 443;
    _client.keepAlivePeriod = 60;
    _client.disconnectOnNoResponsePeriod = 1;
shamblett commented 3 years ago

I don't know what you mean by 'reduces reconnecting'?

Your giving your broker 1 second to respond to your ping request, as I say in my experience this should be OK but it may not be OK for you, you may have to monitor this and change it accordingly, other than that you should now go a maximum of 61 seconds when you turn Wifi off before you start auto reconnecting, this should be Ok.

MurtuzaSrashtaSoft commented 3 years ago

ok.Thank you so much for the clear-cut clarification.

MurtuzaSrashtaSoft commented 3 years ago

When app in background.will open it after 7 to 10 minit.mqtt not able to reconnect With connected status.

Status is reconnecting only. I will sent you log tomorrow.

shamblett commented 3 years ago

Its OK, you don't need to, there have been many problems with flutter putting apps in the background, look through the closed issues to see how some users have handled this. Basically you need to pick up when the app goes into background and disconnect, reconnect again when it comes back into foreground, the client can't do this for you.

This is a different situation from that of staying in the foreground and just turning WiFi off.

MurtuzaSrashtaSoft commented 3 years ago

I tried to find past closed issue related to that.but none of have proper solutions.

It's good for me if you share tickets number.

shamblett commented 3 years ago

What ticket numbers? If there is nothing in the closed issues about this I can't help you any further, I don't use flutter, you are better asking these questions on a flutter list.

shamblett commented 3 years ago

@noah-depriest even more fixes for the autoreconnect sequence have gone into the client from issue #299 and the client has been re published at version 9.5.0, please update to this for any further testing you may do.

noah-depriest commented 3 years ago

@shamblett I appreciate the heads up and also the help from @wagner-rebello and team. We are currently using a manual reconnect approach that has been fairly stable, but will definitely take a look at the updates.

As of now I think the only problem we have is that after several hours we lose connection with our broker and unable to reconnect unless we reboot the app. I'm fairly certain this is the problem you mentioned before, where the JWT token (which is used to generate our URL) expires and needs to be refreshed. We'll be working on fix for that soon that regenerates the URL when the token expires and hopefully that solves the problem.

We've also had our fair share of problems trying to maintain the connection in the background, but I believe this is more on the operating system and less about the mqtt_client.

shamblett commented 3 years ago

Yes, there have been problems reported before with flutter going into the background if you look through the closed issues. I can't help here as I don't use flutter, maybe the google guys can help more.

wagner-rebello commented 3 years ago

We've also had our fair share of problems trying to maintain the connection in the background, but I believe this is more on the operating system and less about the mqtt_client.

When you put app in background in iOS after some time it enters in deepsleep. All kind of timers and futures stop running in the specified time and the OS let it run again after 15 minutes by a small time amount.

Android also have something similar, but only if the RAM usage is high or the app was put in "battery saving mode".

MurtuzaSrashtaSoft commented 3 years ago

I manage background scenarios by extended WidgetsBindingObserver

@override
  void didChangeAppLifecycleState(AppLifecycleState state) {
    super.didChangeAppLifecycleState(state);
    if (state == AppLifecycleState.inactive ||
        state == AppLifecycleState.detached) return;
    _isBackGround = state == AppLifecycleState.paused;
    if (_isBackGround) { 
      AwsConnection.instance.device?.disconnect()
    }else{
      AwsConnection.instance.device?.connect()
  }

The above code does not maintain the connection in the background.

noah-depriest commented 3 years ago

After more modifications and testing on our end, it seems that our issues were caused by our specific case with IoT core and session tokens expiring, which the auto reconnect feature has no control over. As of now we, we are still using a manual reconnect approach because we have to dynamically change our connection URL over time.

I'm closing this issue now since we don't have any problems with the library as of now and also sounds like other improvements have been made to the auto reconnect feature out side this thread.

@shamblett, thanks again for supporting this library!