peterhinch / micropython-mqtt

A 'resilient' asynchronous MQTT driver. Recovers from WiFi and broker outages.
MIT License
549 stars 116 forks source link

Error breaking outage resilience on ESP32: "wifi:AP has neither DSSS parameter nor HT Information, drop it" #63

Closed Molaire closed 2 years ago

Molaire commented 2 years ago

I encountered this error while testing my code for resilience against internet/wifi outage... It starts with a ECONNABORTED and retries, failing with ENOTCONN just like expected.

At a certain point though, it throw the wifi:AP error once or twice and then freeze. Since I have other coroutines running and they are under watchdogs, the fact that it froze means it simply exited the _keep_connected coroutine.

My guess is it's the wifi_connect call inside it that now throws a non-OSError in MicroPython 1.17

Error in reconnect. [Errno 128] ENOTCONN
Connecting to broker.
Error in reconnect. [Errno 128] ENOTCONN
Connecting to broker.
Error in reconnect. [Errno 128] ENOTCONN
Connecting to broker.
Error in reconnect. [Errno 128] ENOTCONN
E (319425) wifi:AP has neither DSSS parameter nor HT Information, drop it
E (319615) wifi:AP has neither DSSS parameter nor HT Information, drop it
Traceback (most recent call last):
...
...
KeyboardInterrupt: 
MicroPython v1.17 on 2021-09-02; ESP32 module (spiram) with ESP32
Type "help()" for more information.

Every time I kill internet connectivy, it errors out once. When I kill wifi, it errors twice before freezing.

Edit: It's not exiting the wifi_connect(), its stuck here:

else:
  while s.status() == network.STAT_CONNECTING:  # Break out on fail or success. Check once per sec.
      await asyncio.sleep(1)
peterhinch commented 2 years ago

E (319425) wifi:AP has neither DSSS parameter nor HT Information, drop it

That has me stumped. I've never seen that message and have no idea what it means. It must be coming from the Espressif firmware. You could ask in the forum to see if anyone else knows what it means. Google was no help :(

Molaire commented 2 years ago

I did some testing with minimal code: this repo with my personal config set up for tls32.py as main.py. Here are my results:

MicroPython 1.16 on ESP32, (absolutely no problem)

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:5640
load:0x40078000,len:12696
load:0x40080400,len:4292
entry 0x400806b0
Checking WiFi integrity.
Got reliable connection
Connecting to broker.
Connected to broker.
Wifi is  up
publish 0
Topic = result Count = 0 Retransmissions = 0 Retained = False
RAM free 88768 alloc 22400
publish 1
Topic = result Count = 1 Retransmissions = 0 Retained = False
RAM free 88768 alloc 22400
publish 2
Topic = result Count = 2 Retransmissions = 0 Retained = False
RAM free 88768 alloc 22400
publish 3
Wifi is  down
Checking WiFi integrity.
Got reliable connection
Connecting to broker.
Connected to broker.
Reconnect OK!
Wifi is  up
Topic = result Count = 3 Retransmissions = 0 Retained = False

Micropython 1.17 on ESP32 (it freezes and needs a hard reset):

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:5656
load:0x40078000,len:12696
load:0x40080400,len:4292
entry 0x400806b0
Checking WiFi integrity.
Got reliable connection
Connecting to broker.
Connected to broker.
Wifi is  up
publish 0
Topic = result Count = 0 Retransmissions = 0 Retained = False
Wifi is  down

The error that is created by my script is probably due to me forcefully calling something that you don't, but the fact stays that the outage resilience is broken on Micropython 1.17 as shown in this example.

For myself, I'll be adding a watchdog for wifi outage and a non-async wifi connection at the start of my code. this is beyond my comprehension, sorry...

peterhinch commented 2 years ago

I have now tested V1.17 release build on the reference board, running the range.py test script. I cannot replicate this issue: the script copes with WiFi outages as designed.

I can only suggest that there is an issue with either your code, your AP or (a long shot) your ESP32. One issue that does crop up regularly with ESP32 is that of voltage drop on USB leads as WiFi changes modes - you might want to check this. I use very short leads with a known-good power source. On the other hand the error message did appear to implicate the AP. Other than that, I'm out of ideas.

You may find some help in the forum: maybe someone else has seen something similar.