peterhinch / micropython-mqtt

A 'resilient' asynchronous MQTT driver. Recovers from WiFi and broker outages.
MIT License
549 stars 116 forks source link

RP2 Pico W wifi_connect() Issue #82

Closed Beormund closed 1 year ago

Beormund commented 1 year ago

This elif clause is causing my Pico W to exit the for loop before wifi has had a chance to connect: https://github.com/peterhinch/micropython-mqtt/blob/39ac3e31094ff3ca479aaafc8148de14b3b7cf02/mqtt_as/mqtt_as.py#L522-L524 s.status() is evaluating to 2 which I think means No IP. Might be better to break if s.status() >= 3 (at least for RP2 )? To get things working I commented out the final elif.

peterhinch commented 1 year ago

Thanks for the report. Needless to say this doesn't happen here. However the Pico W uses the same WiFi chip as the Pyboard D and also uses the same driver, so it seems reasonable to use the same code. The behaviour may be AP-dependent. Perhaps some AP's cause status 2 (no IP) to appear before status 1 (connecting), in which case the code should read:

            for _ in range(60):  # Break out on fail or success. Check once per sec.
                await asyncio.sleep(1)
                if s.isconnected():
                    break
                # Break out quickly on failure
                if (s.status() not in (1, 2)):
                    break
peterhinch commented 1 year ago

I've pushed an update, perhaps you'd like to give it a run.

Hamsanger commented 1 year ago

Doesn't work reliably for me on an ESP32 on MicroPython 19.1. For this port, the STAT_CONNECTING returned by WLAN.status() evaluates to 1001, not 1 or 2, so the logic of if s.isconnected() or not 1 <= s.status() <= 2: doesn't function as intended. I can only connect to an MQTT broker if my WLAN is already active, otherwise the connection attempt fails immediately.

peterhinch commented 1 year ago

I can only connect to an MQTT broker if my WLAN is already active, otherwise the connection attempt fails immediately.

In terms of initial connection, this is intended behaviour. The code assumes that, on power up, broker and WLAN are live; it throws an exception if not. The assumption is that the user needs to physically correct a problem before trying again. For example, if the config data doesn't match the environment, there is no point in endlessly re-trying.

Subsequent re-connections after outages should, of course, be transparent.

In my testing with ESP32 it works as designed, however there can be differences in AP's/DHCP servers. Under what circumstances are you seeing WLAN.status() == 1001? I'd like to replicate this to figure out what's going on.

Beormund commented 1 year ago

I've pushed an update, perhaps you'd like to give it a run.

Thank you. This is now working well for my Pico W.

Hamsanger commented 1 year ago

Hi Peter,

I think you have misunderstood my reference to WLAN active. If the WLAN client on the ESP32 device is already connected, the MQTT client connection may succeed, simply because the test for isconnected() works and it does not have to check the client STA_IF status. Of course my MQTT broker and WiFi are active. The problem is your most recent commit broke the MQTT client for ESP32. The network.WLAN class has STAT_CONNECTING defined as 1001 for the ESP32 port, so it returns this value, not 1 or 2 as you are expecting in the if s.isconnected() or not 1 <= s.status() <= 2: code. If I change this back to something like: if s.isconnected() or s.status() != STAT_CONNECTING it works.

Beormund commented 1 year ago

Hi Peter,

I think you have misunderstood my reference to WLAN active. If the WLAN client on the ESP32 device is already connected, the MQTT client connection may succeed, simply because the test for isconnected() works and it does not have to check the client STA_IF status. Of course my MQTT broker and WiFi are active. The problem is your most recent commit broke the MQTT client for ESP32. The network.WLAN class has STAT_CONNECTING defined as 1001 for the ESP32 port, so it returns this value, not 1 or 2 as you are expecting in the if s.isconnected() or not 1 <= s.status() <= 2: code. If I change this back to something like: if s.isconnected() or s.status() != STAT_CONNECTING it works.

Maybe separate out by hardware: if ESP32 .... if RP2... is needed? Or just simplify to if s.isconnected(): break and do away with s.status() ?

for _ in range(60):  # Break out on fail or success. Check once per sec.
    await asyncio.sleep(1)
    if s.isconnected():
        break
    # Break out quickly on failure
    if ESP32 and hasattr(network, "STAT_CONNECTING") and s.status() != network.STAT_CONNECTING:
        break
    elif (s.status() not in (1, 2)):
        break
Hamsanger commented 1 year ago

Some form of the s.status() check should be retained for all ports to allow the loop to break out early in the event of other reasons for connection failure such as bad password, otherwise it will continue to loop waiting for an isconnected() == True for the full 60 seconds.
The test for the return value of the s.status() should use symbolic constants in all cases, not an assumed range of values like 1 or 2 which is non-portable across ports and bad practice. That is why the connection status is returned as a symbolic value like STAT_xxx. It would be easier if all device ports returned the same set of STAT_xxx values, but it appears that they don't. The current ESP32 implementation does return STAT_CONNECTING and other helpful status values such as STAT_WRONG_PASSWORD, but does not return something like STAT_NO_IP or other intermediate connection status, so testing for what type of microcontroller is running looks to be necessary.

peterhinch commented 1 year ago

@Hamsanger Sorry, I made a pig's ear of this and didn't grasp your point.

I'm all for symbolic constants but they don't always exist, notably on the Pyboard which has none. Also the Pico value 2 reported above.

I've pushed a fix that reverts the ESP32 error.

Hamsanger commented 1 year ago

I re-tested on an ESP32. The fix works OK to get a connection all being well, but the newer behaviour of the ESP32 WLAN driver on version 1.19+ of MicroPython means that by default the STA_IF retries indefinitely when the access point is unavailable or the credentials are wrong. The interface keeps returning status() == STAT_CONNECTING, so the wait loop starting line 515 still waits for the full 60 seconds if authentication is not working. If I add something like s.config(reconnects=3) just before the s.connect() call (line 514) it will give up after about 12 seconds.

So your intended behaviour of giving up early does not work as expected with the current ESP32 WLAN driver. It is tempting to think you could leverage this persistent background retry behaviour to make the MQTT driver more robust, but this would then only be applicable to the ESP32, and maybe not useful for low power battery applications using sleep.

peterhinch commented 1 year ago

the wait loop starting line 515 still waits for the full 60 seconds if authentication is not working

That behaviour seems surprising. On ESP32 the symbolic constant STAT_WRONG_PASSWORD exists. If an AP is detected but credentials are wrong surely it should bail with the above error rather than keep trying. On the other hand keeping trying if the AP is not detected does make sense (detecting an AP can take time) and this is the reason for the 60s delay in the loop.

If I add something like s.config(reconnects=3)

I'm not sure how this differs from reducing the time in the loop. I assume the reconnects default is -1 (unlimited). The aim of the loop timeout is to be platform-independent and I don't know if all platforms support the reconnects option.

In terms of the loop it's not obvious what can be done about this. Perhaps I should put a note in the docs about ESP32 credentials as this behaviour is unexpected.

peterhinch commented 1 year ago

In view of this behaviour I've added code to force a disconnect in the event that the loop times out. We already do this in the ESP8266 case and I think it avoids any risk of the wlan being left in an unexpected state after a (re)connect event.

Hamsanger commented 1 year ago

Yes, surprising behaviour for the ESP32 WLAN driver. As far as I can tell from the source code the driver will only return STAT_IDLE, STAT_CONNECTING and STAT_GOT_IP (on success), and no other status codes. While in the retry loop, the driver only seems to return STAT_CONNECTING until it succeeds, so the other potentially useful status codes are not available for use.

Returning the WLAN interface to a known state on failed connection seems like a good idea. I have tested your latest update and it works as expected. As there is no way to shorten the time spent looping, I will try moving the WLAN initialisation in my application code to a task on startup, rather than hanging waiting for success, so as not to test my users patience with an unresponsive device when it boots and the network is down.

Thanks for the great MQTT implementation and your quick responses.