peterhinch / micropython-mqtt

A 'resilient' asynchronous MQTT driver. Recovers from WiFi and broker outages.
MIT License
549 stars 116 forks source link

Don't Init Wifi if already connected on ESP32 #61

Closed jdtsmith closed 2 years ago

jdtsmith commented 2 years ago

I noticed this line skips initializing WiFi if it's already active, but only for ESP8266. But I think this would be very helpful on ESP32 as well, if you first initialize WiFi for NTP, FTP, etc. setup, then setup MQTT in your final code. I experienced MQTT connect issues when it tries to immediately reconnect on an existing connection. Any reason this test should not be performed for all devices? I tried that and it works fine on ESP32. Thanks for mqtt_as!

peterhinch commented 2 years ago

The reason for that line is that ESP8266 remembers its WiFi credentials, so after a power cycle it automatically reconnects. It is therefore likely that, on initial connection, an ESP8266 is already connected.

The assumption behind the design is that mqtt_as will control the connection and disconnection from WiFi. In the event of any outage it disconnects, and later initiates re-connection. I'm not at all sure that mqtt_as will play nicely in an application where other tasks are connecting to WiFi.

I'll think some more about this, but my initial gut feeling is that this is unwise.

kevinkk525 commented 2 years ago

The PR #57 should solve the problem if all other tasks use that hardware interface to connect. (However, I may not be able to work on that PR within the next months. But mainly testing and docs are missing).

jdtsmith commented 2 years ago

Thanks. In the meantime would something quite simple such as an optional parameter like connect(wifi_init = True, ...) to let mqtt_as know (passing False) that you've already brought up and tested the WiFi connection be sensible? On ESP32 initializing then re-initializing the wifi in short order leads to (likely router-dependent) connection issues. Simply skipping this has resolved them.

I very often use mqtt_as with NTP, uftpd and utelnetserver, and haven't had any issues with that (other than this double-connecting). I need to bring WiFi up right away to sync time, for example, before configuring the MQTT endpoints.

peterhinch commented 2 years ago

I'm sorry but on further thought I really don't like this. A key aim of mqtt_as is resilience in the face of outages of WiFi, broker or internet access (if the broker is remote). Anything that removes control of the WiFi from mqtt_as would require extensive testing to ensure that the system as a whole could recover correctly from WiFi outages. Such testing would be system-specific.

To provide this option would be to encourage users to build systems which were not resilient, resulting in a deluge of support queries.

kevinkk525 commented 2 years ago

I'm using NTP, uftp and webrepl and except for NTP you can start everything without active wifi. The NTP in my case just waits until the wifi is connected and lets mqtt_as do the connection. There's no need to build such a potentially problem creating workaround into mqtt_as. (You can check out my PR, it should work right away on your esp32 and even work as a drop in replacement. Just need to adapt calls to network.WLAN to go through the hardware interface.)

jdtsmith commented 2 years ago

I can certainly imagine configurations where you simply must connect to the network first before establishing an MQTT connection, for example to learn which MQTT server to connect to. I also don't see how it could possibly impact robustness to let mqtt_as know "I've got wifi covered on first connection, but you can maintain it, thanks". But you've spent a lot more time debugging various setups so maybe there's something I'm overlooking! Thanks. Feel free to close.

kevinkk525 commented 2 years ago

Sure I can imagine such a setup too. That's what the PR allows you to do. You create the WIFI interface object, connect when you want, then when ready you tell mqtt_as to connect and it will take care of the connection from that point.

jdtsmith commented 2 years ago

Sounds perfect, will follow that PR.

kevinkk525 commented 2 years ago

Only the latest commit fixing wifi deinit is missing from the wlan interface in the PR: https://github.com/peterhinch/micropython-mqtt/commit/dd277fc3e0a20208982dc339b605242e07fab74b

peterhinch commented 2 years ago

I'm happy with that too, the key point being that it is only the initial connection that is managed elsewhere.

jdtsmith commented 2 years ago

Btw in case it wasn’t clear my proposal was also just to (optionally) skip the initial connection.

peterhinch commented 2 years ago

I hadn't appreciated that, but I don't think the problems end there. Consider the case where the application performs other internet-facing tasks concurrently with MQTT.

How should the system cope with WiFi outages? It could poll station_if.isconnected() and wait for mqtt_as to reconnect, but this isn't adequate. If the WiFi goes down in the middle of an operation there may be errors to trap. The correct action to recover from an outage is application dependent. Further, the application must be designed so that an outage, whenever it occurs, cannot provoke blocking.

The aim of mqtt_as is to be resilient in the face of outages. It seems to me that extending this bomb-proof aspect to external internet-facing code would require a new API. @kevinkk525 do you have a view on this?

I also wonder how common is this use case. This is the first time it's been raised in four years of development, so maybe developing a new API is not a good use of effort. Maybe all we need is to cope with initial connection and to write a large health warning.

kevinkk525 commented 2 years ago

The approach in the PR would replace the usage of network.WLAN and therefore introduce a new API. The current implementation is basic so for good multi-application approach a lock and some more checks are needed to prevent concurrent reconnection attempts and similar. The simple approach would be to have other tasks only attempt to connect to the wifi by using thi snew interface/API and then let only mqtt_as control the connection (so e.g. if NTP fails, don't reconnect, just try again later).

jdtsmith commented 2 years ago

Why not a general async reconnecting API, separate from MQTT, that could be used with any network services? There would need to be a lock and of course the services would have to be altered to use it, but “for free” reliable WiFi monitoring and reconnection for NTP, FTP, Werrepl, etc. sounds like a deal. Most people implement a hacky version of this I suspect.

kevinkk525 commented 2 years ago

The problem is that it's not as easy as that to keep a stable and working wifi connection. @peterhinch found that it is sometimes neccessary to reconnect the wifi to get a connection to the mqtt broker. That is something that a separate wifi monitoring API wouldn't be able to detect. Personally I don't know how often this is the case but for a really reliable and resilient connection, this might be the way and therefore it's more difficult to keep a reliable wifi connection than one would think.

spacemanspiff2007 commented 2 years ago

That is something that a separate wifi monitoring API wouldn't be able to detect.

But wouldn't that be easy to implement in the connection interface? Just add a reconnect() option which could be called by the mqtt client.

kevinkk525 commented 2 years ago

That's what my PR does. The problem is, it gets very complex quickly if multiple tasks/projects all try to reconnect when their connection fails. You could end up reconnecting all the time, depending on how those tasks are written. It's possible to write an interface for all that, but it's going to be rather complex and not needed for most applications.

It's simpler and more reliable if only one such task/project actively ensures a stable connection, which is mqtt_as since you're most likely using that as a primary means of communication with no other tasks that keep an open socket all the time. With the PR however, mqtt_as doesnt care about the type of connection anymore, it could be wifi, BT, ethernet or whatever else you can write that offers the same API.

jdtsmith commented 2 years ago

Given PR #57, I'll close this; feel free to reopen if needed.