peterhinch / micropython-mqtt

A 'resilient' asynchronous MQTT driver. Recovers from WiFi and broker outages.
MIT License
549 stars 116 forks source link

ESP32 no reconnect on WiFi outage #132

Closed GarikFirst closed 4 months ago

GarikFirst commented 4 months ago

Running latest git version of micropython-mqtt on esp32-wroom with micropython 1.22 - there is absolutely no attempts to reconnect after wifi outage (router wireless on/off for a long time)

Here is code sample to reproduce:

import asyncio

import network

from lib.mqtt_as import MQTTClient, config

async def up(client):
    while True:
        await client.up.wait()
        print("Client is up")
        client.up.clear()

async def down(client):
    while True:
        await client.down.wait()
        print("Client is down")
        client.down.clear()

async def wifi_state():
    while True:
        wlan = network.WLAN(network.STA_IF)
        print("WiFi connected:", wlan.isconnected())
        await asyncio.sleep(3)

config["server"] = SERVER
config["port"] = 8883
config["user"] = USER
config["password"] = PASS
config["ssl"] = True
config["ssl_params"] = {
    "server_hostname": SERVER
}
config["queue_len"] = 1
config["ssid"] = SSID
config["wifi_pw"] = PASS

MQTTClient.DEBUG = True  # Отладка

mqtt_client = MQTTClient(config)

async def main() -> None:
    client_up_task = asyncio.create_task(up(mqtt_client))
    client_down_task = asyncio.create_task(down(mqtt_client))
    wifi_task = asyncio.create_task(wifi_state())

    tasks = [client_up_task] + [client_down_task] + [wifi_task]

    await mqtt_client.connect()

    await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(main())

And here is output:

MPY: soft reboot
WiFi connected: True
Checking WiFi integrity.
WiFi connected: True
WiFi connected: True
Got reliable connection
Connecting to broker.
Connected to broker.
Client is up
WiFi connected: True
WiFi connected: True
WiFi connected: True
WiFi connected: True
WiFi connected: True
WiFi connected: True
WiFi connected: True
RAM free 92960 alloc 26144
WiFi connected: True
WiFi connected: True
WiFi connected: True
Client is down <- Wireless OFF on router
WiFi connected: False
WiFi connected: False
...
Wireless ON on router
WiFi connected: False
WiFi connected: False
...

Am I doing something wrong?

ebolisa commented 4 months ago

For testing purposes, can you put your device closer to the router? If not, can you get the wifi db when connected?

GarikFirst commented 4 months ago

For testing purposes, can you put your device closer to the router? If not, can you get the wifi db when connected?

Yes of course

>>> wlan = network.WLAN(network.STA_IF)
>>> wlan.status("rssi")
-76
peterhinch commented 4 months ago

There seems to be a problem with the latest firmware - even the demos fail to reconnect. Firmware V1.20 is OK. I will investigate.

ebolisa commented 4 months ago

FYI, I also had to go back to V1.20 as v1.22 caused connections issues but I haven’t had time to look into it.

peterhinch commented 4 months ago

V1.21.0 is the latest release build which works. As a workround I suggest using this until I can figure out what's going on. I'll report back.

The release notes for V1.22.0 include the ominous phrase:

The esp32 port has been updated to use IDF version 5.0.4

bobveringa commented 4 months ago

@peterhinch Thanks for the notification. I literally just started investigating this issue for our internal sensors. We are unable to use v1.21 as it has some changes with heap allocation that prevent SSL connections (with our software).

There are indeed many issues with v1.22 especially when compiling with v5.0.4, I just made a build that uses IDF v5.1.2, which seems to resolve some issues, but just replaces them with other problems.

I found this issue on the esp-idf github issue tracker https://github.com/espressif/esp-idf/issues/11615 this seems like it could be related.

peterhinch commented 4 months ago

I have pushed an update. The change in IDF 5.0.4 causes the ESP32 to issue network.STAT_IDLE while connecting.

GarikFirst commented 4 months ago

Unfortunately, no results yet, tested with slightly modified script from first post (added print("WiFi status:", wlan.status()) to wifi_state) on MicroPython v1.22.1 on 2024-01-05; Generic ESP32 module with ESP32

Results so far:

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:4728
load:0x40078000,len:14888
load:0x40080400,len:3368
entry 0x400805cc
WiFi connected: False
WiFi status: 1001
WiFi connected: True
WiFi status: 1010
Checking WiFi integrity.
WiFi connected: True
WiFi status: 1010
Got reliable connection
Connecting to broker.
WiFi connected: True
WiFi status: 1010
Connected to broker.
Client is up
WiFi connected: True
WiFi status: 1010
...
Client is down - wifi off on router
WiFi connected: False
WiFi status: 1001
...
WiFi connected: False
WiFi status: 201
...
E (104328) wifi:Set status to INIT
WiFi connected: False
WiFi status: 202
...

And immediate connect after soft reset

peterhinch commented 4 months ago

This morning I took your original script and made minimal changes to adapt to my WiFi and broker. The fault was entirely reproducible. Likewise running the demo range.py it failed to recover from a WiFi outage.

With the updated mqtt_as the demo now recovers. With your adapted script this is the outcome:

Checking WiFi integrity.
Got reliable connection
Connecting to broker.
Connected to broker.
Client is up
WiFi connected: True
WiFi connected: True
WiFi connected: True
WiFi connected: True
WiFi connected: True
WiFi connected: True
WiFi connected: True
RAM free 129360 alloc 21552
WiFi connected: True
Client is down  <--------------- WiFi AP dispabled
WiFi connected: False
WiFi connected: False
WiFi connected: False
WiFi connected: False
WiFi connected: False
WiFi connected: False  <------------ AP re-enabled
Checking WiFi integrity.
WiFi connected: True
WiFi connected: True
Got reliable connection
Connecting to broker.
Connected to broker.
Reconnect OK!
Client is up
WiFi connected: True
WiFi connected: True
WiFi connected: True
peterhinch commented 4 months ago

I have studied the status values in the ESP32 network module: values in the range 200..204 are error conditions and those >=1000 are valid. I have pushed an update to better reflect this - the STAT_GOT_IP status might have caused issues.

Please try the latest mqtt_as.py.

For anyone interested the relevant code is here.

GarikFirst commented 4 months ago

Tried on lates from GitHub, keep getting WiFi status: 201/202… Demo range.py nevertheless was able to reconnect… Strange, I continue digging

peterhinch commented 4 months ago

Just in case you didn't know

>>> import network
>>> a = [d for d in dir(network) if d.startswith('STAT')]
>>> a
['STAT_ASSOC_FAIL', 'STAT_BEACON_TIMEOUT', 'STAT_CONNECTING', 'STAT_GOT_IP', 'STAT_HANDSHAKE_TIMEOUT', 'STAT_IDLE', 'STAT_NO_AP_FOUND', 'STAT_WRONG_PASSWORD']
>>> network.STAT_WRONG_PASSWORD
202
>>> network.STAT_NO_AP_FOUND
201
>>> 
GarikFirst commented 4 months ago

I think there is something with my board, after hard reset there can be long periods of 20X errors until wifi card finally get it together and start connecting. I think I should close this, because original issue definitely fixed.