letscontrolit / ESPEasy

Easy MultiSensor device based on ESP8266/ESP32
http://www.espeasy.com
Other
3.28k stars 2.22k forks source link

mega-20211224 losing wifi connection #3918

Open microkid42 opened 2 years ago

microkid42 commented 2 years ago

Summarize of the problem/feature request

I have a NodeMCU, this has been running fine for 4 years with the dev 0012 firmware from 2018. Last week, I noticed there was new firmware, so I upgraded to mega-20211224. Since then, the device loses wifi connection after >12 hours. I cannot access the device, it requires a hard reset (power down/up) to make it responsive again. I already tried to disable wifi sleep and restart wifi on lost connection, but the problem persists. Any ideas what might cause this?

Expected behavior

Wifi stays up forever,

Actual behavior

Wifi goes down after >12 hours or so.

Steps to reproduce

Start the NodeMCU and wait...

Yes

Hardware:

ESP Easy version: 20211224 (firmware normal 8266 4M1M)

ESP Easy settings/screenshots: Unit Number: 0
Local Time: 2022-01-17 07:22:56
Time Source: NTP
Time Wander: 0.000 [msec/sec]
Uptime: 0 days 0 hours 24 minutes
Load: 16.08% (LC=1356)
CPU Eco Mode: false
Boot: Cold Boot (0)
Reset Reason: External System
Last Action before Reboot: Background Task
SW WD count: 0

Memory Free RAM: | 11632 (9192 - sendContentBlocking) Heap Max Free Block: | 9440 Heap Fragmentation: | 19% Free Stack: | 3648 (2032 - sendContentBlocking) Network IP Config: | Static IP / Subnet: | 192.168.1.30 / 255.255.255.0 Gateway: | 192.168.1.1 Client IP: | 192.168.1.44 DNS: | 192.168.1.1 / (IP unset) Allowed IP Range: | All Allowed Connected: | 23 m 44 s Number Reconnects: | 0 WiFi ❔ WiFi Connection: | 802.11n (RSSI -71 dBm) SSID: | (FC:34:97:23:B5:38) Channel: | 13 Encryption Type: | WPA2/PSK Last Disconnect Reason: | (202) Auth fail Configured SSID1: | Configured SSID2: |   STA MAC: | 68:C6:3A:89:FB:BA AP MAC: | 6A:C6:3A:89:FB:BA

WiFi Settings Force WiFi B/G: | false Restart WiFi Lost Conn: | true Force WiFi No Sleep: | true Periodical send Gratuitous ARP: | true Connection Failure Threshold: | 0 Max WiFi TX Power: | 17.50 Current WiFi TX Power: | 14.00 WiFi Sensitivity Margin: | 3 Send With Max TX Power: | false Extra WiFi scan loops: | 0 Use Last Connected AP from RTC: | false Firmware Build:⋄ | 20116 - Mega System Libraries:⋄ | ESP82xx Core 2843a5ac, NONOS SDK 2.2.2-dev(38a443e), LWIP: 2.1.2 PUYA support Git Build:⋄ | mega-20211224_f162ebf Plugin Count:⋄ | 47 [Normal] Build Origin: | GitHub Actions Build Time:⋄ | Dec 24 2021 14:39:35 Binary Filename:⋄ | ESP_Easy_mega_20211224_normal_ESP8266_4M1M Build Platform:⋄ | Linux-5.4.0-54-generic-x86_64-with-glibc2.29 Git HEAD:⋄ | mega-20211224_f162ebf System Status Syslog Log Level: | None Serial Log Level: | Info Web Log Level: | None Network Services Network Connected: | ✔ NTP Initialized: | ✔ ESP Board ESP Chip ID: | 9042874 (0x89FBBA) ESP Chip Frequency: | 80 MHz ESP Chip Model: | ESP8266 ESP Chip Cores: | 1 ESP Board Name: | PLATFORMIO_ESP12E Storage Flash Chip ID: | Vendor: 0xEF Device: 0x4016 Flash Chip Real Size: | 4096 kB Flash IDE Size: | 4096 kB Flash IDE Speed: | 40 MHz Flash IDE Mode: | DOUT Flash Writes: | 0 daily / 0 boot Sketch Size: | 921 kB (2148 kB free) Max. OTA Sketch Size: | 1019 kB (1044464 bytes) OTA possible: | true OTA 2-step Needed: | false SPIFFS Size: | 934 kB (844 kB free) Page size: | 256 Block size: | 8192 Number of blocks: | 116 Maximum open files: | 5 Maximum path length: | 32

tonhuisman commented 2 years ago

Are you by any chance using an ASUS Mesh WiFi system? There are known issues with those systems and (many) IoT WiFi devices, that can be resolved by changing some settings (Link to ASUS support)

microkid42 commented 2 years ago

I do have an Asus AX58U wifi router, but Mesh is not enabled afaik. I will look into the link, thanks. update: changed the settings as advised, will monitor today. update: after 4 hours, lost connection again. I now reverted the RX58U to firmware 43588 to check if that will help.

microkid42 commented 2 years ago

I reverted back to dev0012, this solved the problem. Pings to the ESP8266 went down from approx 50ms to 3ms. So I must conclude there is something in later firmware versions of ESPeasy that causes mentioned issues.

ghtester commented 2 years ago

Yeah my oldest ESP Easy node has also longest Uptime: GIT version: mega-20180104 Uptime: 207 days 16 hours 48 minutes Load: 28% (LC=15733) Free Mem: 19928 (12160 - sendWebPageChunkedData) The latest firmware releases have a much more features/sensors supported but I am not still able to achieve such long Uptime as it was possible with much older ESP Easy versions.

So sometimes it makes sense to stay on even very old ESP Easy release if you aren't missing the latest features.

TD-er commented 2 years ago

I'm currently working on implementing ESP32 IDF 4.4 and have run into several issues regarding WiFi and Ethernet which makes it (almost) clear to me where to look for "the bug" in the WiFi part of my code. The issue with the Asus AX58U is very likely (>90% sure) a bug in the SDK code, as all other projects using ESP's with a recent SDK have issues with those.

microkid42 commented 2 years ago

Thanks TD-er. Let's hope it can be solved in the next release :) I'm currently happy with the old dev0012 version, but would like to have a current firmware if possible.

StKob commented 2 years ago

I will add upon that a bit. Usually I have unstable connection from my esp8266. There was one firmware in last year that was rock solid in that regards but I don't remember which one.

mrtnbr commented 2 years ago

I think I have the same issue with an ESP32 (https://github.com/Xinyuan-LilyGO/TTGO-T-Display) and ESP_Easy_mega_20211128_display_ESP32_4M316k.

I use MikroTik cAP AC access points for Wifi connection.

microkid42 commented 2 years ago

I'm currently working on implementing ESP32 IDF 4.4 and have run into several issues regarding WiFi and Ethernet which makes it (almost) clear to me where to look for "the bug" in the WiFi part of my code. The issue with the Asus AX58U is very likely (>90% sure) a bug in the SDK code, as all other projects using ESP's with a recent SDK have issues with those.

Hi TD-er. I noticed a new firmware last week. Can you tell if this issue has been solved in this release?

TD-er commented 2 years ago

Not in the latest build, as I found another issue regarding WiFi, which may make connecting to a mesh setup even more unstable. However, I did fix that one and the latest GH actions builds should have this fix. Please have a check using the test build made here: https://github.com/letscontrolit/ESPEasy/actions/runs/2076809936

microkid42 commented 2 years ago

Hi TD-er,

Any update on the fix of this issue? (see your post of 19 jan)?

TD-er commented 2 years ago

Absolutely! A few days ago I merged a pull request I made regarding WiFi stability.

Can you test just any of these latest GitHub Action builds? https://github.com/letscontrolit/ESPEasy/actions For example this one: https://github.com/letscontrolit/ESPEasy/actions/runs/3356688563

microkid42 commented 2 years ago

Cool! :) Will do the next days.

microkid42 commented 2 years ago

Sorry, can't find the bin file. I'm no github expert ;)

TD-er commented 2 years ago

Made a few more commits, so this is the link to the latest GH Actions build: https://github.com/letscontrolit/ESPEasy/actions/runs/3361777988

There you can see the "Binaries" link: image

This will download a large (755 MB) file, with the bin files inside this zip.

N.B. you need to be logged in to GitHub, but since you're posting here, that should not be an issue.

microkid42 commented 1 year ago

Well, it took some time, but yesterday I upgrade the NodeMCU to version mega-20231013. Sadly, the connection to my Asus RT-AX58U router is not stable. The NodeMCU can stay connected for hours, but also for minutes. I have no idea what to do next to make this realiable again.

TD-er commented 1 year ago

Have you set the WiFi to use "B/G" mode only? (Tools->Advanced page at the bottom)

microkid42 commented 1 year ago

Will do right now. I also enabled "restart wifi lost connection" and "force wifi no sleep"

update: no luck. After more than an hour, the connection is lost again. It seems to come back, but it never really does.

Request timeout for icmp_seq 14666 Request timeout for icmp_seq 14667 Request timeout for icmp_seq 14668 Request timeout for icmp_seq 14669 Request timeout for icmp_seq 14670 Request timeout for icmp_seq 14671 Request timeout for icmp_seq 14672 Request timeout for icmp_seq 14673 64 bytes from 192.168.1.30: icmp_seq=14621 ttl=255 time=53994.501 ms 64 bytes from 192.168.1.30: icmp_seq=14622 ttl=255 time=53000.411 ms 64 bytes from 192.168.1.30: icmp_seq=14623 ttl=255 time=51996.803 ms 64 bytes from 192.168.1.30: icmp_seq=14624 ttl=255 time=51001.258 ms 64 bytes from 192.168.1.30: icmp_seq=14625 ttl=255 time=50000.062 ms 64 bytes from 192.168.1.30: icmp_seq=14626 ttl=255 time=48996.205 ms 64 bytes from 192.168.1.30: icmp_seq=14627 ttl=255 time=47990.902 ms 64 bytes from 192.168.1.30: icmp_seq=14629 ttl=255 time=46022.107 ms 64 bytes from 192.168.1.30: icmp_seq=14632 ttl=255 time=43022.124 ms 64 bytes from 192.168.1.30: icmp_seq=14635 ttl=255 time=40017.078 ms 64 bytes from 192.168.1.30: icmp_seq=14636 ttl=255 time=39014.645 ms

TD-er commented 1 year ago

And with the last 2 checked?

Can you see in your access point a bit more about this connected client? For example the link speed?

TD-er commented 1 year ago

Can you also backup your settings and try to completely wipe the flash? The ESP will store WiFi calibration data on the flash, outside the file system. However the old calibration data is not compatible with the new calibration data (this is part of the Espressif SDK, not ESPEasy)

microkid42 commented 1 year ago

I freshly reinstalled the system yesterday with a complete wipe.

What is the Mbps connection speed reported there? TX: 39, RX: 48

Can you see whether it is using something like HT20/HT40? No

Does your AP show you other info about the connected client, like short/long GI, MMS, etc? No

Do you have IGMP snooping enabled in either the AP or your switch (if you have a managed switch) Yes, enabled

Can you enable/disable to forward/discard unknown multicast groups? No

Does your AP have band steering enabled (should not matter if you have selected to force 802.11b/g) Yes, but disabled on 2.4Ghz

Log from the wifi: Oct 22 21:06:08 wlceventd: wlceventd_proc_event(685): eth5: Auth 68:C6:3A:89:FB:BA, status: Successful (0), rssi:0 Oct 22 21:06:08 wlceventd: wlceventd_proc_event(722): eth5: Assoc 68:C6:3A:89:FB:BA, status: Successful (0), rssi:-59

Log from the NodeMCU:

WiFi ?

WiFi Connection: | 802.11n (RSSI -63 dBm) SSID: | ***** (FC:34:97:23:B5:38) Channel: | 13 Encryption Type: | WPA2/PSK Last Disconnect Reason: | (1) Unspecified Configured SSID1: | **** Configured SSID2: |   STA MAC: | 68:C6:3A:89:FB:BA AP MAC: | 6A:C6:3A:89:FB:BA

WiFi Settings Force WiFi B/G: | true Restart WiFi Lost Conn: | true Force WiFi No Sleep: | true Periodical send Gratuitous ARP: | false Connection Failure Threshold: | 0 Max WiFi TX Power: | 17.50 Current WiFi TX Power: | 14.00 WiFi Sensitivity Margin: | 3 Send With Max TX Power: | true Extra WiFi scan loops: | 0 Use Last Connected AP from RTC: | false Extra Wait WiFi Connect: | true Enable SDK WiFi Auto Reconnect: | true

TD-er commented 1 year ago

Hmm you have a 802.11n connection according to your sysinfo page, but you have force WiFi B/G set to true. This means ESPEasy was not able to make a connection using a non-N mode and thus is forced to attempt the N mode after 10 failed attempts.

So this means you probably have not allowed "G" mode in your AP. (only "N" is allowed) Can you look into this?

microkid42 commented 1 year ago

You're correct, the wifi was set to N only. Just changed it to 'auto'. Let's see now :) Hmm, set it to "auto" and "legacy", rebooted the NodeMCU, but it still connecting on 802.11n. strange,

TD-er commented 1 year ago

Which exact build are you using?

Can you check with the "B/G" only checkbox unchecked? Ton suggested me that there was at some point a bug where this checkbox was inverted somehow and I don't remember which builds that was.

microkid42 commented 1 year ago

Unchecking did the trick, it is at g now.

WiFi Connection: | 802.11g (RSSI -61 dBm) Now let's see if it stays stable.

Git Build: | mega-20231013

TD-er commented 1 year ago

OK, so have to check if that's been fixed now since that last build or still needs fixing. Afterall, that Friday-13th build was already quite some time ago :)

microkid42 commented 1 year ago

haha, not a good day to code ;)

tonhuisman commented 1 year ago

Living dangerously 🙊

microkid42 commented 1 year ago

b/g mode did to seem the trick :)

Uptime: | 1 days 17 hours 8 minutes

TD-er commented 1 year ago

Good to know :)