peterhinch / micropython-mqtt

A 'resilient' asynchronous MQTT driver. Recovers from WiFi and broker outages.
MIT License
575 stars 126 forks source link

Option in client.wifi_connect() to force/keep connections on initial call #29

Closed goatchurchprime closed 2 years ago

goatchurchprime commented 4 years ago

If I start the library (by calling client.connect()) when the wifi hotspot is not turned on, then client.wifi_connect() line 489 raises an OSError.

However, if the wifi hotspot goes down after a connection has been made, then the client._keep_connected() task will trap this exception and retry forever without throwing an OSError.

It would be useful to have devices where the behavior doesn't change just because there's been an interruption in power -- ie the feature where they reconnect as and when the hotspot comes available doesn't depend on the accidental fact that the hotspot was there when the power came on.

Maybe this could be done with a client.connect(timeout=15) parameter that hangs on the connection forever if timeout=0.

(I am working on an ESP8266 with the mqtt_as library compiled in as frozen bytecode)

peterhinch commented 4 years ago

That behaviour is by design and is documented. My assumption was that a user would initially set up and test a device, needing a clear indication if it was unable to connect to the network. With your timeout=0 option there would be no means of knowing if the device was broken, the network was down or the credentials were incorrect. You could walk away from a setup which stood no chance of ever running.

This sounds drastic! Could you expand a bit on the use case?

goatchurchprime commented 4 years ago

Easy: I have a fixed set of ESP devices in various inaccessible locations around a site (where there is incomplete wifi coverage) keeping watch on things, and the MQTT-broker is mobile (specifically, it can be built in a spare android phone that, unlike a raspberryPi, already has a built in battery, working wifi hotspot, screen, etc, and can be provisioned according to https://github.com/MatthewCroughan/userland-mqtt-stack/ )

The side-effect of the perpetual retrying of the _keep_connected() task is that I can take the phone/mqtt broker somewhere else, come back the next day, and the ESP devices will automatically reconnect with it.

Once reconnected, I have various node-red flows on the same device that converse with the device and extract the information about what has been going on since I was away. I have successfully queued up data in the ESP device and backfilled the values into a time-series database.

This is useful. However, if the power to any of these devices is interrupted and it comes back on while the roving broker happens not to be in range, then the client.connect() will crash out without entering the _keep_connected() task loop where it had been before the power was lost.

So, instead of the normal case where the mqtt broker is like a fixed base station and the mqtt clients are roving, we have the clients being fixed into the fabric of a building and the mqtt broker(s) are roving.

This use case would also support issue #12 for multiple ssids, so that there can be multiple roving phones/mqtt brokers so it doesn't need to be the same one every time.

kevinkk525 commented 4 years ago

You can actually easily work around your issue:

I use this method for the initial connect:

    async def _connectCaller(self):
        while True:
            try:
                await self.connect()
                return
            except OSError as e:
                _log.error("Error connecting to wifi or mqtt: {!s}".format(e), local_only=True)
                # not connected after trying.. not much we can do without a connection except
                # trying again.
                await asyncio.sleep(10)
                continue

Then you can handle multiple wifi ssids in this method and only try to connect mqtt once you get a wifi connection. Once you are done with sending your data, you disconnect the mqtt connection and wait until the wifi is gone before trying to connect to a wifi again. Or you just watch the connection status and disconnect once the wifi connection is lost. However looking at the disconnect I realized that it currently doesn't work as expected: #31 But once that is fixed, this could be the way you support multiple ssids and have no problem with your current issue. The disconnect should stop all mqtt coroutines and reconnect attempts once it is fixed.

goatchurchprime commented 4 years ago

Yep, this is the work-around I had put in for now.

I do not think my suggestion should be taken on with any haste. It needs to sit around on the record until it looks as if it is generally useful.

The amazing power of this library is that it deals with a huge amount of boiler-plate code to do with connecting and maintaining connections, while all the other tasks can do their own things. It means that the scripts end up being short and precisely to the point. As a consequence we can move forward with more complex behaviors without getting bogged down with clutter.

(I just spotted another issue of a potential race condition where: if client.isconnected(): await client.publish(t, p) might hit a disconnect event while scheduling the publish() task. I guess some of the old nightmares multithreaded coding are reproduced in the async system. )

kevinkk525 commented 4 years ago

(I just spotted another issue of a potential race condition where: if client.isconnected(): await client.publish(t, p) might hit a disconnect event while scheduling the publish() task. I guess some of the old nightmares multithreaded coding are reproduced in the async system. )

That is not a real race condition, that is just how it works because you can never be sure that the connection is still there a few ms later. Therefore there can always be a disconnect event during the scheduling of publish. If you need a timeout for publish methods you could use the code provided in mqtt_as_cancel.py