meshtastic / firmware

Meshtastic device firmware
https://meshtastic.org
GNU General Public License v3.0
3.36k stars 821 forks source link

[Bug]: T-Beam lose WiFi connection after some hours #1946

Closed ReiHof closed 1 year ago

ReiHof commented 1 year ago

Category

WiFi

Hardware

T-Beam

Firmware Version

2.0.2.18ab874

Description

I have installed version 2.0.2 on different devices. Only T-Beams lose WiFi connection after a few hours and can only reconnect after reset.

sometimes "Dissasociated Timeout" is displayed

As a test I had changed the return value in WiFiAPClient.cpp in function reconnectWiFi() from 12 hours to 5 minutes. The Problem has changed but not resolved.

Relevant log output

15:39:46 16678 [Router] Should encrypt MQTT?: 1
15:39:46 16678 [Router] Expanding short PSK #1
15:39:46 16678 [Router] Using AES128 key!
15:39:46 16678 [Router] ESP32 crypt fr=c4f7c01c, num=3c52bc2d, numBytes=17!
15:39:46 16678 [Router] enqueuing for send (id=0x3c52bc2d Fr0x1c To0xff, WantAck0, HopLim2 Ch0x8 encrypted rxtime=1668181186 rxSNR=13.25 rxRSSI=1.84117)
15:39:46 16678 [Router] txGood=327,rxGood=332,rxBad=0
15:39:46 16678 [Router] rx_snr found. hop_limit:2 rx_snr:13.250000
15:39:46 16678 [Router] rx_snr found in packet. Setting tx delay:3780
15:39:46 16678 [Router] FIXME-update-db Sniffing packet
15:39:46 16678 [Router] Delivering rx packet (id=0x3c52bc2d Fr0x1c To0xff, WantAck0, HopLim3 Ch0x0 Portnum=3 rxtime=1668181186 rxSNR=13.25 rxRSSI=1.84117)
15:39:46 16678 [Router] Forwarding to phone (id=0x3c52bc2d Fr0x1c To0xff, WantAck0, HopLim3 Ch0x0 Portnum=3 rxtime=1668181186 rxSNR=13.25 rxRSSI=1.84117)
15:39:46 16678 [Router] Update DB node 0xc4f7c01c, rx_time=1668181186
15:39:46 16678 [Router] NOTE: tophone queue is full, discarding oldest
15:39:46 16678 [Router] Module 'routing' considered
15:39:46 16678 Telling client we have new packets 354
[16680698][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[16680698][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 2 - AUTH_EXPIRE
[16680702][D][WiFiGeneric.cpp:975] _eventCallback(): WiFi AutoReconnect Running
15:39:49 16680 [WebServerThread] ************ [WiFi-event] event: 5 ************
15:39:49 16680 [WebServerThread] Disconnected from WiFi access point

WiFi lost connection. Reason: 2
15:39:50 16682 [RadioIf] Starting low level send (id=0x3c52bc2d Fr0x1c To0xff, WantAck0, HopLim2 Ch0x8 encrypted rxtime=1668181186 rxSNR=13.25 rxRSSI=1.84117 priority=64)
15:39:50 16682 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=33, time 559 ms
15:39:50 16682 [RadioIf] AirTime - Packet transmitted : 559ms
15:39:51 16683 [RadioIf] Completed sending (id=0x3c52bc2d Fr0x1c To0xff, WantAck0, HopLim2 Ch0x8 encrypted rxtime=1668181186 rxSNR=13.25 rxRSSI=1.84117 priority=64)
15:39:56 16688 [Power] Battery: usbPower=1, isCharging=0, batMv=4155, batPct=100
15:39:59 16691 [GPS] WANT GPS=1
15:40:16 16708 [Power] Battery: usbPower=1, isCharging=0, batMv=4155, batPct=100
15:40:36 16728 [Power] Battery: usbPower=1, isCharging=0, batMv=4155, batPct=100
15:40:56 16748 [Power] Battery: usbPower=1, isCharging=0, batMv=4155, batPct=100
15:41:16 16768 [Power] Battery: usbPower=1, isCharging=0, batMv=4155, batPct=100

.......
.......

15:41:46 16797 [RadioIf] Completed sending (id=0x09b1c765 Fr0xe4 To0xff, WantAck0, HopLim2 Ch0x8 encrypted rxtime=1668181302 rxSNR=12.75 rxRSSI=1.84117 priority=64)
[16800699][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 9 - STA_LOST_IP
15:41:49 16800 ************ [WiFi-event] event: 9 ************
15:41:56 16808 [Power] Battery: usbPower=1, isCharging=0, batMv=4155, batPct=100
15:42:15 16827 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=33, time 559 ms
15:42:15 16827 [RadioIf] Lora RX (id=0x09b1c766 Fr0xe4 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=13.5 rxRSSI=1.84117)
15:42:15 16827 [RadioIf] AirTime - Packet received : 559ms
15:42:15 16827 [Router] Add packet record (id=0x09b1c766 Fr0xe4 To0xff, WantAck0, HopLim3 Ch0x8 encrypted rxSNR=13.5 rxRSSI=1.84117)
15:42:15 16827 [Router] Using channel 0 (hash 0x8)
jfirwin commented 1 year ago

I'm also seeing this behavior on my T-Beam at work. WiFi connection seems to drop overnight (if not sooner). Once off, a LanScan of the network shows that the device isn't connected to the WiFi. After reboot it reconnects. I can snag some logs as well if it's helpful!

caveman99 commented 1 year ago
07:05:05 75134 [Router] txGood=1282,rxGood=1312,rxBad=3
[75134789][E][WiFiClient.cpp:422] write(): fail on fd 52, errno: 11, "No more processes"
[75135604][E][WiFiClient.cpp:422] write(): fail on fd 52, errno: 11, "No more processes"
[75136604][E][WiFiClient.cpp:422] write(): fail on fd 52, errno: 11, "No more processes"
[75137604][E][WiFiClient.cpp:422] write(): fail on fd 52, errno: 11, "No more processes"
[75138604][E][WiFiClient.cpp:422] write(): fail on fd 52, errno: 11, "No more processes"
[75139604][E][WiFiClient.cpp:422] write(): fail on fd 52, errno: 11, "No more processes"
[75140604][E][WiFiClient.cpp:422] write(): fail on fd 52, errno: 11, "No more processes"
[75141604][E][WiFiClient.cpp:422] write(): fail on fd 52, errno: 11, "No more processes"
[75142604][E][WiFiClient.cpp:422] write(): fail on fd 52, errno: 11, "No more processes"
[75143361][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[75143361][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 200 - BEACON_TIMEOUT
[75143365][D][WiFiGeneric.cpp:966] _eventCallback(): WiFi Reconnect Running
[75143373][E][WiFiClient.cpp:422] write(): fail on fd 52, errno: 113, "Software caused connection abort"
07:05:13 75143 [mqtt] ************ [WiFi-event] event: 5 ************
07:05:13 75143 Disconnected from WiFi access point

WiFi lost connection. Reason: 200
07:05:13 75143 [Power] Battery: usbPower=0, isCharging=0, batMv=4039, batPct=82
07:05:13 75143 [RadioIf] Starting low level send (id=0x67b4a746 Fr0x64 To0x24, WantAck0, HopLim3 Ch0x8 encrypted rxtime=1668582305 priority=70)
07:05:13 75143 [RadioIf] (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=58, time 1058 ms
07:05:13 75143 [RadioIf] AirTime - Packet transmitted : 1058ms
07:05:15 75145 [RadioIf] Completed sending (id=0x67b4a746 Fr0x64 To0x24, WantAck0, HopLim3 Ch0x8 encrypted rxtime=1668582305 priority=70)
[75148407][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[75148407][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 2 - AUTH_EXPIRE
[75148411][D][WiFiGeneric.cpp:975] _eventCallback(): WiFi AutoReconnect Running
07:05:18 75148 ************ [WiFi-event] event: 5 ************
07:05:18 75148 Disconnected from WiFi access point

WiFi lost connection. Reason: 2
[75153431][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[75153432][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 2 - AUTH_EXPIRE
[75153435][D][WiFiGeneric.cpp:975] _eventCallback(): WiFi AutoReconnect Running
07:05:23 75153 [WebServerThread] ************ [WiFi-event] event: 5 ************
07:05:23 75153 [WebServerThread] Disconnected from WiFi access point

WiFi lost connection. Reason: 2
[75158454][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[75158454][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 2 - AUTH_EXPIRE
[75158458][D][WiFiGeneric.cpp:975] _eventCallback(): WiFi AutoReconnect Running
07:05:28 75158 [WebServerThread] ************ [WiFi-event] event: 5 ************
07:05:28 75158 [WebServerThread] Disconnected from WiFi access point

WiFi lost connection. Reason: 2
07:05:33 75163 [Power] Battery: usbPower=0, isCharging=0, batMv=3787, batPct=55
[75163461][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[75163461][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 2 - AUTH_EXPIRE
[75163464][D][WiFiGeneric.cpp:975] _eventCallback(): WiFi AutoReconnect Running
07:05:33 75163 [WebServerThread] ************ [WiFi-event] event: 5 ************
07:05:33 75163 [WebServerThread] Disconnected from WiFi access point

WiFi lost connection. Reason: 2
caveman99 commented 1 year ago

the 'Lost IP' case should trigger a reconnect too. I added that but seems not enough. I also added clearing wifi bonds on trying to reconnect.

caveman99 commented 1 year ago

also that one seems curious

E (75269023) wifi:Set status to INIT
[75268930][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[75268931][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 53 - ETH_GOT_IP
07:07:19 75268 ************ [WiFi-event] event: 5 ************
07:07:19 75268 Disconnected from WiFi access point

WiFi lost connection. Reason: 53
karamo commented 1 year ago

Same Issue a T-Beam with FW 2.0.2. One after ≈3 hours, an other after ≈2 hours. Display: "conncted" IP-Scan: not found

jfirwin commented 1 year ago

When i flashed 2.0.4.5991b59 WiFi connected and then when I tried to visit the tbeams ip address I got the following.

[ 33651][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[ 33651][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 34 - STA_STOP
18:29:47 33 ************ [WiFi-event] event: 5 ************
18:29:47 33 Disconnected from WiFi access point
[HTTPS:E] An receive error occured, FID=54
[HTTPS:I] Connection closed. Socket FID=54

WiFi lost connection. Reason: 34

After rebooting I only ever received this error:

??:??:?? 2 SSL Cert Ready!
[  2509][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 0 - WIFI_READY
[  2598][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 2 - STA_START
??:??:?? 2 JOINING WIFI soon: ssid=JFIRWIN
??:??:?? 2 (bw=250, sf=11, cr=4/8) packet symLen=8 ms, payloadSize=237, time 3188 ms
??:??:?? 2 myNodeInfo.bitrate = 74.341286 bytes / sec
??:??:?? 2 PowerFSM init, USB power=1
??:??:?? 2 Enter state: BOOT
??:??:?? 2 Setting CPU to 240mhz because WiFi is in use.
??:??:?? 2 [Power] Battery: usbPower=1, isCharging=0, batMv=4199, batPct=100
??:??:?? 2 [SerialModule] Serial Module Disabled
??:??:?? 2 [StoreForwardModule] Store & Forward Module - Disabled
??:??:?? 2 [RangeTestModule] Range Test Module - Disabled
??:??:?? 2 [WifiConnect] ... Reconnecting to WiFi access point
??:??:?? 3 [GPS] NMEA GPS time 2022-11-17 18:37:53
00:00:02 3 [GPS] Upgrading time to RTC 1668710273 secs (quality 4)
18:37:53 3 [GPS] Read RTC time as 1668710273
18:37:53 3 [GPS] hasValidLocation RISING EDGE
18:37:53 3 [GPS] WANT GPS=0
18:37:53 3 [GPS] publishing pos@63767f81:2, hasVal=1, GPSlock=1
18:37:53 3 [GPS] New GPS pos@63767f81:3 lat=40.483280, lon=-111.911730, alt=1368, pdop=2.04, track=0.00, speed=0.01, sats=6
18:37:53 3 [GPS] onGPSChanged() pos@63767f81, time=1668710273, lat=404832800, lon=-1119117298, alt=1368
18:37:53 3 [GPS] updatePosition LOCAL pos@63767f81, time=1668710273, latI=404832800, lonI=-1119117298, alt=1368
18:37:53 3 [GPS] Node status update: 1 online, 1 total
[  4281][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED
[  4281][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 2 - AUTH_EXPIRE
[  4284][D][WiFiGeneric.cpp:966] _eventCallback(): WiFi Reconnect Running
18:37:54 4 ************ [WiFi-event] event: 5 ************
18:37:54 4 Disconnected from WiFi access point

WiFi lost connection. Reason: 2
18:37:55 5 [PowerFSM] Enter state: POWER
jfirwin commented 1 year ago

@caveman I shared with you a couple of logs in discord where it looked like the wifi stayed online but the IP dropped. I'm seeing this again, with a different error message now.

17:24:19 7228 [mqtt] Connecting to MQTT server <url>, port: 1883, username: , password: 
[7231347][I][WiFiClient.cpp:253] connect(): select returned due to timeout 3000 ms for fd 52

[...]

[10082721][D][WiFiClient.cpp:546] connected(): Disconnected: RES: 0, ERR: 128
18:11:53 10082 [mqtt] Connecting to MQTT server <url>, port: 1883, username: , password: 
[10089666][E][WiFiGeneric.cpp:1438] hostByName(): DNS Failed for ,<url>
karamo commented 1 year ago

After ≈3+ hours ... the T-Beam loses the WiFi connection, the red and blue LEDs flash exactly 30 times, then pause with the message:

[14972888][E][WiFiGeneric.cpp:1438] hostByName(): DNS Failed for xyz.uber.space

This is always repeated exactly the same. LoRa connection ok. FW 2.0.4 - T-Beam

karamo commented 1 year ago

Same as before but another debug-log with device.debug_log_enabled true

??:??:?? 0 AXP192 PMU init succeeded, using AXP192 PMU
[   836][D][XPowersAXP192.tpp:2010] setInterruptImpl(): setInterruptImpl DISABLE - 0xffffffffff

??:??:?? 0 =======================================================================
??:??:?? 0 DC1  : +   Voltage:3300 mV 
??:??:?? 0 DC2  : -   Voltage:1800 mV 
??:??:?? 0 DC3  : +   Voltage:3300 mV 
??:??:?? 0 LDO2 : +   Voltage:3300 mV 
??:??:?? 0 LDO3 : +   Voltage:3300 mV 
??:??:?? 0 =======================================================================
??:??:?? 0 Battery: usbPower=1, isCharging=0, batMv=4139, batPct=99
??:??:?? 0 I2C device found at address 0x34
??:??:?? 0 axp192/axp2101 PMU found
??:??:?? 0 I2C device found at address 0x3c
??:??:?? 0 0x3 subtype probed in 2 tries 
??:??:?? 0 ssd1306 display found
??:??:?? 0 2 I2C devices found
??:??:?? 0 Meshtastic hwvendor=4, swver=2.0.4.5417671-d
...
??:??:?? 1 Loading /prefs/channels.proto
??:??:?? 1 Loaded saved channelFile version 20
[  1080][E][vfs_api.cpp:104] open(): /littlefs/oem/oem.proto does not exist, no permits for creation
??:??:?? 1 No /oem/oem.proto preferences found
??:??:?? 1 Number of Device Reboots: 17
...
??:??:?? 2 SSL Cert Ready!
[  2770][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 0 - WIFI_READY
[  2861][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 2 - STA_START
??:??:?? 2 ************ [WiFi-event] event: 2 ************
??:??:?? 2 [StreamAPI] WiFi station started
...
07:49:27 2 [WifiConnect] ... Reconnecting to WiFi access point
[  3737][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 4 - STA_CONNECTED
07:49:27 3 ************ [WiFi-event] event: 4 ************
07:49:27 3 Connected to access point
07:49:29 5 [PowerFSM] Enter state: POWER
07:49:29 5 [Screen] Screen: Powered...
[  6268][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 7 - STA_GOT_IP
[  6269][D][WiFiGeneric.cpp:991] _eventCallback(): STA IP: 192.168.1.155, MASK: 255.255.255.0, GW: 192.168.1.1
07:49:30 6 ************ [WiFi-event] event: 7 ************
07:49:30 6 Obtained IP address: 192.168.1.155
... Starting network services
07:49:30 6 mDNS responder started
07:49:30 6 mDNS Host: Meshtastic.local
07:49:30 6 Starting NTP time client
07:49:30 6 Initializing Web Server ...
07:49:30 6 Starting Secure Web Server...
07:49:30 6 Starting Insecure Web Server...
07:49:30 6 Web Servers Ready! :-) 
07:49:30 6 API server listening on TCP port 4403
07:49:30 6 [ApiServer] Connecting to MQTT server ******.uber.space, port: ******, username: ******, password: ******
07:49:30 6 MQTT connected
...
07:53:48 263 [Screen] LastScreenTransition exceeded 10137ms transitioning to next frame
[HTTPS:I] New connection. Socket FID=53
[HTTPS:I] New connection. Socket FID=54
[HTTPS:I] Request: GET / (FID=54)
[264886][E][vfs_api.cpp:104] open(): /littlefs/static/index.html does not exist, no permits for creation
[HTTPS:I] Request: GET /index.ed76384f.js (FID=54)
[HTTPS:I] Request: GET /index.c409c94b.css (FID=53)
[265081][E][vfs_api.cpp:104] open(): /littlefs/static/index.ed76384f.js does not exist, no permits for creation
[266019][E][vfs_api.cpp:104] open(): /littlefs/static/index.c409c94b.css does not exist, no permits for creation
[HTTPS:I] Client closed connection, FID=54
[HTTPS:I] Client closed (FID=54, cstate=2)
[HTTPS:I] Connection closed. Socket FID=54
[HTTPS:I] Connection closed. Socket FID=53
[HTTPS:I] New connection. Socket FID=53
[HTTPS:I] Request: GET /Logo_Black.svg (FID=53)
[266259][E][vfs_api.cpp:104] open(): /littlefs/static/Logo_Black.svg does not exist, no permits for creation
...
07:53:59 275 [Router] Incoming message was filtered 0x25d634b4
[HTTPS:I] Request: GET /hotspot-detect.html (FID=53)
[275719][E][vfs_api.cpp:104] open(): /littlefs/static/hotspot-detect.html does not exist, no permits for creation
[275724][E][vfs_api.cpp:104] open(): /littlefs/static/hotspot-detect.html.gz does not exist, no permits for creation
[HTTPS:I] Request: PUT /api/v1/toradio (FID=53)
07:54:00 276 [WebServerThread] webAPI handleAPIv1ToRadio
...

09:11:36 4932 [Router] publish msh/2/c/Galactica/!f71e4b64, 92 bytes ^^^ that came through.

09:11:51 4946 [Router] publish msh/2/c/Galactica/!f71e4b64, 105 bytes ^^^ that didn't get through

...
09:12:04 4960 [Screen] LastScreenTransition exceeded 10108ms transitioning to next frame

[HTTPS:I] Connection timeout. FID=53
[HTTPS:I] Connection closed. Socket FID=53
09:12:08 4964 [Power] Battery: usbPower=1, isCharging=0, batMv=4136, batPct=99
09:12:11 4967 [mqtt] Connecting to MQTT server ******.uber.space, port: ******, username: ******, password: ******
[4974755][E][WiFiGeneric.cpp:1438] hostByName(): DNS Failed for ******.uber.space
09:12:18 4974 [mqtt] Failed to contact MQTT server...
karamo commented 1 year ago

Same as before, but MQTT-Server name changed to MQTT-Server IP:

...
10:45:26 6 API server listening on TCP port 4403
10:45:26 6 [ApiServer] Connecting to MQTT server 95.143.172.xxx, port: xxx, username: xxx, password: xxx
10:45:26 6 [WebServerThread] MQTT connected
...
14:03:55 11915 [Screen] LastScreenTransition exceeded 10106ms transitioning to next frame
[HTTPS:E] Could not accept() new connection
[HTTPS:E] Could not accept() new connection
[HTTPS:E] Could not accept() new connection. FID=-1
14:03:57 11917 [RadioIf] (bw=250, sf=10, cr=4/8) packet symLen=4 ms, payloadSize=67, time 623 ms
...
14:20:10 12890 [mqtt] Connecting to MQTT server 95.143.172.xxx, port: xxx, username: xxx, password: xxx
[12893608][I][WiFiClient.cpp:253] connect(): select returned due to timeout 3000 ms for fd 52
14:20:13 12893 [mqtt] Failed to contact MQTT server...
00svd00 commented 1 year ago

I think that root of problem may be liitlefs. I've tried to compile firmware with WIFI creds hardcoded and after some point WIFI failed, but was able to reconnect. BUT webui was still dead because none of files can be loaded.

garthvh commented 1 year ago

I think that root of problem may be liitlefs. I've tried to compile firmware with WIFI creds hardcoded and after some point WIFI failed, but was able to reconnect. BUT webui was still dead because none of files can be loaded.

The files are probably not there at all if you flashed custom firmware.

00svd00 commented 1 year ago

I think that root of problem may be liitlefs. I've tried to compile firmware with WIFI creds hardcoded and after some point WIFI failed, but was able to reconnect. BUT webui was still dead because none of files can be loaded.

The files are probably not there at all if you flashed custom firmware.

Of course I've also flashed FS image. I've made full flash of official FW and after that updated it with custom one.. After reboot all files are at their place and webUI is fully operable. Littlefs failure happens after about ~3h of work

garthvh commented 1 year ago

Nobody else has reported anything related to littlefs, why do you think littlefs is creating wifi issues?

I have a tbeam that has been connected for 9 hours without any issues, the local web ui is still available.

00svd00 commented 1 year ago

Nobody else has reported anything related to littlefs, why do you think littlefs is creating wifi issues?

I have a tbeam that has been connected for 9 hours without any issues, the local web ui is still available.

Because:

  1. I saw errors in console output
  2. Before hardcoding wifi creds it was failing to reconnect with "auth_expire" error
  3. After hardcoding wifi was able to reconnect May be I'm wrong, but both problems look binded for me
garthvh commented 1 year ago

Please try testing with the released 2.0.6 version, too many variables makes it really hard to pinpoint the actual issue.

Anonym-tsk commented 1 year ago

2.0.6 - problem is not resolved. T-beam lose connection every few hours

karamo commented 1 year ago

• In v 2.0.6 the problem is still there. • It is only at every of my T-Beam boards! • TLORA V2.1-1.6 have not the problem. • Analysis of the DEBUG-output:

This error came out of the library-file ...

...\.platformio\packages\framework-arduinoespressif32\libraries\WiFi\src
Z.526: uint8_t WiFiClient::connected()

[6985477][D][WiFiClient.cpp:546] connected(): Disconnected: RES: 0, ERR: 128

this error is #define ENOTCONN 128 /* Socket is not connected */

karamo commented 1 year ago

Please try testing with the released 2.0.6 version, too many variables makes it really hard to pinpoint the actual issue.

If the WiFi connection is interrupted for a short time, then no reconnect succeeds.

karamo commented 1 year ago

The solution would basically be quite simple: 1) query the connected status before each WiFi transmission and, if necessary, reconnect. 2) query the connection status to the MQTT server before each MQTT publish and, if necessary, reconnect. But I have no idea where I could put that to test it.

Anonym-tsk commented 1 year ago

The solution would basically be quite simple

It's not solution. Solution is realize why disconnects occurs.

karamo commented 1 year ago

Ok, it may bee just a workaround. But the WiFi connection can be interrupted at any time and then a reconnect is required. And the problem of connection loss after 2+ hours is well known and there is no clear solution so far. One difference between the boards is the ESP32 chip T-Beam T22_V1.1: Chip is ESP32-D0WDQ6-V3 (revision v3.0) TLORA V2 T3_V1.6.1: Chip is ESP32-PICO-D4 (revision v1.0)

karamo commented 1 year ago

My analysis has shown that the TLORA V2.1_1.6 modules with FW v2.0.7 are also affected by the WiFi bug. The difference, however, is that the TLORA V2.1_1.6 is characterized by independent, ev. regain several consecutive reboots. A grid of ≈3+ hours becomes visible. KGX_2022-12-11 0 = offline Status at MQTT-Server 1 = online (publish) to the MQTT-Server, at/after a Reboot.

garthvh commented 1 year ago

I have a 2.1.1.6 that has run for several weeks without intervention. MQTT and web ui are always available.

garthvh commented 1 year ago

My analysis has shown that the TLORA V2.1_1.6 modules with FW v2.0.7 are also affected by the WiFi bug. The difference, however, is that the TLORA V2.1_1.6 is characterized by independent, ev. regain several consecutive reboots. A grid of ≈3+ hours becomes visible. KGX_2022-12-11 0 = offline Status at MQTT-Server 1 = online (publish) to the MQTT-Server, at/after a Reboot.

Do you have a 2.4 ghz network available? Have you enabled any power related settings? I have only had reliable ESP32 Wi-Fi on 2.4 ghz and power settings can also cause issues.

karamo commented 1 year ago

My analysis has shown that the TLORA V2.1_1.6 modules with FW v2.0.7 are also affected by the WiFi bug. The difference, however, is that the TLORA V2.1_1.6 is characterized by independent, ev. regain several consecutive reboots. A grid of ≈3+ hours becomes visible. KGX_2022-12-11 0 = offline Status at MQTT-Server 1 = online (publish) to the MQTT-Server, at/after a Reboot.

Do you have a 2.4 ghz network available? Have you enabled any power related settings? I have only had reliable ESP32 Wi-Fi on 2.4 ghz and power settings can also cause issues.

Yes, 2,4 GHz in 1m distance. Power related settings ... Where? On the WLAN-Router?

karamo commented 1 year ago

@garthvh 1) Your consideration completely misses the point. Just because you can't reproduce it doesn't mean it doesn't exist. This 2+/3+ hour failure is well known and no one has yet found a viable solution. 2) Again: WiFi can always fail. Then a reconnect has to take place. It's as simple as that from my point of view.

karamo commented 1 year ago

By the way: My modification in FW 2.0.7, if the connection is lost, to perform a reconnect is a brutal but useful workaround for me. This means that the T-Beam board has been running continuously for over a day and delivers packets to the MQTT server. Recurring independent reboots are annoying but currently unavoidable.

garthvh commented 1 year ago

If it runs for two weeks without intervention on my default config then there is not a global wifi problem with a tlora, and we don't need to be applying a work around to every device. Likely there is a configuration that is causing the issue, we need to figure out what the underlying issue is. It looks to be isolated to having the device.role of router on. There is a whole power config section of the firmware, have you updated any of those settings?

garthvh commented 1 year ago

By the way: My modification in FW 2.0.7, if the connection is lost, to perform a reconnect is a brutal but useful workaround for me. This means that the T-Beam board has been running continuously for over a day and delivers packets to the MQTT server. Recurring independent reboots are annoying but currently unavoidable.

Can you try the ROUTER_CLIENT role instead of ROUTER? ROUTER does some power saving that is probably contrary to being powered and using wifi.

karamo commented 1 year ago

By the way: My modification in FW 2.0.7, if the connection is lost, to perform a reconnect is a brutal but useful workaround for me. This means that the T-Beam board has been running continuously for over a day and delivers packets to the MQTT server. Recurring independent reboots are annoying but currently unavoidable.

Can you try the ROUTER_CLIENT role instead of ROUTER? ROUTER does some power saving that is probably contrary to being powered and using wifi.

Ok, let us figue out what the underlying issue is ... I don't use the ROUTER role. The T-Beam is CLIENT (default).

jfirwin commented 1 year ago

My device that had issues was also running as CLIENT I believe last firmware I tried on it was 2.0.6. Just checked an it is indeed offline. (Haven't used it in a minute)

karamo commented 1 year ago

I will start a test: 1) erase flash via esptool 2) wipe & flash via web-flasher FW 2.0.7 device "tbeam" 3) minimal config for my Channel, WiFi & MQTT-Server 4) Reboot at 17:31 MEZ with monitoring status on MQTT-Server

Modul: T-Beam T22_V1.1 with internal 18650 LiPo & USB-Supply, no USB to PC now I'm waiting ... how long will it be online?

garthvh commented 1 year ago

I will start a test:

  1. erase flash via esptool
  2. wipe & flash via web-flasher FW 2.0.7 device "tbeam"
  3. minimal config for my Channel, WiFi & MQTT-Server
  4. Reboot at 17:31 MEZ with monitoring status on MQTT-Server

Modul: T-Beam T22_V1.1 with internal 18650 LiPo & USB-Supply, no USB to PC now I'm waiting ... how long will it be online?

Can you do the same with your tlora 2.1.1.6? as that behavior seems different and I am not able to replicate it.

karamo commented 1 year ago

So, the 1st test is done. Test start at: 17:31:05 2022-12-15 ✅ WZ1 MQTT-Gateway online

19:33:46 2022-12-15 ❌ WZ1 MQTT-Gateway offline 19:33:47 2022-12-15 ✅ WZ1 MQTT-Gateway online

19:38:55 2022-12-15 ❌ WZ1 MQTT-Gateway offline 19:38:56 2022-12-15 ✅ WZ1 MQTT-Gateway online

19:41:48 2022-12-15 ❌ WZ1 MQTT-Gateway offline

You can see the ≈2 hour function ok. 2 auto reboots and then died.

karamo commented 1 year ago

With TLORA 2.1_1.6 I did it here: https://github.com/meshtastic/firmware/issues/1946#issuecomment-1352359676

TLORA 2.1_1.6 will work with ≈3+ hours grid of auto reboots & reconnects ok.

karamo commented 1 year ago

2nd run T-Beam without 18650 Accu, only USB-Power

20:43:09 2022-12-15 ✅ WZ1 MQTT-Gateway online

22:50:34 2022-12-15 ❌ WZ1 MQTT-Gateway offline 22:50:35 2022-12-15 ✅ WZ1 MQTT-Gateway online

22:54:28 2022-12-15 ❌ WZ1 MQTT-Gateway offline 22:54:29 2022-12-15 ✅ WZ1 MQTT-Gateway online

22:57:06 2022-12-15 ❌ WZ1 MQTT-Gateway offline

Same result as before.

Anonym-tsk commented 1 year ago

I have the same issue.

alexoley commented 1 year ago

Hi @karamo ! I have the same weird issue with tbeam and FW 2.0.7. Will check with FW 2.0.12

ReiHof commented 1 year ago

Here is a trace from my T-Beam with V2.0.12

DEBUG | 18:15:37 18098 [Router] Update DB node 0xa51c272c, rx_time=1673633737 WARN | 18:15:37 18098 [Router] ToPhone queue is full, discarding oldest DEBUG | 18:15:37 18098 [Router] Module 'routing' considered DEBUG | 18:15:37 18098 [Screen] LastScreenTransition exceeded 3249ms transitioning to next frame DEBUG | 18:15:40 18102 [Screen] LastScreenTransition exceeded 3063ms transitioning to next frame DEBUG | 18:15:43 18105 [Screen] LastScreenTransition exceeded 3068ms transitioning to next frame DEBUG | 18:15:46 18107 [Power] Battery: usbPower=1, isCharging=0, batMv=4169, batPct=100 DEBUG | 18:15:46 18108 [Screen] LastScreenTransition exceeded 3065ms transitioning to next frame DEBUG | 18:15:50 18111 [Screen] LastScreenTransition exceeded 3067ms transitioning to next frame DEBUG | 18:15:53 18114 [Screen] LastScreenTransition exceeded 3064ms transitioning to next frame DEBUG | 18:15:56 18117 [Screen] LastScreenTransition exceeded 3071ms transitioning to next frame DEBUG | 18:15:59 18120 [Screen] LastScreenTransition exceeded 3074ms transitioning to next frame DEBUG | 18:16:02 18123 [Screen] LastScreenTransition exceeded 3074ms transitioning to next frame DEBUG | 18:16:05 18126 [Screen] LastScreenTransition exceeded 3063ms transitioning to next frame DEBUG | 18:16:06 18127 [Power] Battery: usbPower=1, isCharging=0, batMv=4169, batPct=100 DEBUG | 18:16:08 18129 [Screen] LastScreenTransition exceeded 3068ms transitioning to next frame DEBUG | 18:16:11 18132 [Screen] LastScreenTransition exceeded 3066ms transitioning to next frame DEBUG | 18:16:14 18135 [Screen] LastScreenTransition exceeded 3065ms transitioning to next frame DEBUG | 18:16:17 18138 [Screen] LastScreenTransition exceeded 3065ms transitioning to next frame DEBUG | 18:16:20 18141 [Screen] LastScreenTransition exceeded 3070ms transitioning to next frame DEBUG | 18:16:23 18145 [Screen] LastScreenTransition exceeded 3074ms transitioning to next frame DEBUG | 18:16:26 18147 [Power] Battery: usbPower=1, isCharging=0, batMv=4169, batPct=100 DEBUG | 18:16:26 18148 [Screen] LastScreenTransition exceeded 3076ms transitioning to next frame [18149362][D][WiFiGeneric.cpp:929] _eventCallback(): Arduino Event: 5 - STA_DISCONNECTED [18149362][W][WiFiGeneric.cpp:950] _eventCallback(): Reason: 2 - AUTH_EXPIRE [18149365][D][WiFiGeneric.cpp:966] _eventCallback(): WiFi Reconnect Running DEBUG | 18:16:28 18149 [WebServerThread] WiFi-Event 5: Disconnected from WiFi access point WARN | 18:16:28 18149 Guru Meditation Error: Core 1 panic'ed (LoadProhibited). Exception was unhandled.

Core 1 register dump: PC : 0x40094538 PS : 0x00060f30 A0 : 0x801fe96c A1 : 0x3ffe15e0
A2 : 0x00000002 A3 : 0x00000000 A4 : 0x000000ff A5 : 0x0000ff00
A6 : 0x00ff0000 A7 : 0xff000000 A8 : 0x002e0455 A9 : 0x3ffe18d0
A10 : 0x00000001 A11 : 0x00000020 A12 : 0x00000001 A13 : 0x3ffe1b60
A14 : 0x3ffcbfc4 A15 : 0x00000000 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000 LBEG : 0x4009454d LEND : 0x4009455d LCOUNT : 0xfffffffe

Backtrace:0x40094535:0x3ffe15e00x401fe969:0x3ffe15f0 0x40205ad6:0x3ffe1900 0x40205b12:0x3ffe1990 0x400d8465:0x3ffe19d0 0x400d8597:0x3ffe1a10 0x400ef685:0x3ffe1a70 0x401301ad:0x3ffe1a90 0x40130221:0x3ffe1c20

ELF file SHA256: 0000000000000000

Rebooting... ets Jul 29 2019 12:21:46

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DIO, clock div:2 load:0x3fff0030,len:1184 load:0x40078000,len:13132 load:0x40080400,len:3036 entry 0x400805e4 [ 11][D][esp32-hal-cpu.c:244] setCpuFrequencyMhz(): PLL: 480 / 2 = 240 Mhz, APB: 80000000 Hz [ 460][I][esp32-hal-psram.c:96] psramInit(): PSRAM enabled ”Ã@INFO | ??:??:?? 0

//\ E S H T /\ S T / C

INFO | ??:??:?? 0 Booted, wake cause 0 (boot count 1), reset_reason=reset DEBUG | ??:??:?? 0 Filesystem files (589824/1048576 Bytes): DEBUG | ??:??:?? 0 /prefs/channels.proto (53 Bytes) DEBUG | ??:??:?? 0 /prefs/config.proto (120 Bytes) DEBUG | ??:??:?? 0 /prefs/db.proto (200 Bytes) DEBUG | ??:??:?? 0 /static/.gitkeep (0 Bytes) DEBUG | ??:??:?? 0 /static/Logo_Black.svg.gz (592 Bytes) DEBUG | ??:??:?? 0 /static/Logo_White.svg.gz (600 Bytes) DEBUG | ??:??:?? 0 /static/apple-touch-icon.png.gz (4300 Bytes) DEBUG | ??:??:?? 0 /static/favicon.ico.gz (2221 Bytes) DEBUG | ??:??:?? 0 /static/icon.svg.gz (842 Bytes) DEBUG | ??:??:?? 0 /static/index-3ad6b632.css.gz (14921 Bytes) DEBUG | ??:??:?? 0 /static/index-7d0857aa.js.gz (347 Bytes) DEBUG | ??:??:?? 0 /static/index-84fcdb37.js.gz (507158 Bytes) DEBUG | ??:??:?? 0 /static/index.html.gz (547 Bytes) DEBUG | ??:??:?? 0 /static/robots.txt.gz (42 Bytes) DEBUG | ??:??:?? 0 /static/site.webmanifest.gz (196 Bytes) [ 917][I][esp32-hal-i2c.c:75] i2cInit(): Initialising I2C Master: sda=21 scl=22 freq=100000 [ 919][I][XPowersCommon.tpp:248] begin(): SDA:21 SCL:22 [ 923][W][Wire.cpp:301] begin(): Bus already started in Master Mode. WARN | ??:??:?? 0 Failed to find AXP2101 power management [ 941][I][XPowersAXP2101.tpp:230] ~XPowersAXP2101(): ~XPowersAXP2101 [ 942][I][XPowersCommon.tpp:248] begin(): SDA:21 SCL:22 [ 946][I][esp32-hal-i2c.c:75] i2cInit(): Initialising I2C Master: sda=21 scl=22 freq=100000 INFO | ??:??:?? 0 AXP192 PMU init succeeded, using AXP192 PMU [ 972][D][XPowersAXP192.tpp:2010] setInterruptImpl(): setInterruptImpl DISABLE - 0xffffffffff

DEBUG | ??:??:?? 0 ======================================================================= DEBUG | ??:??:?? 0 DC1 : + Voltage:3300 mV DEBUG | ??:??:?? 0 DC2 : - Voltage:1800 mV DEBUG | ??:??:?? 0 DC3 : + Voltage:3300 mV DEBUG | ??:??:?? 1 LDO2 : + Voltage:3300 mV DEBUG | ??:??:?? 1 LDO3 : + Voltage:3300 mV DEBUG | ??:??:?? 1 ======================================================================= DEBUG | ??:??:?? 1 Battery: usbPower=1, isCharging=0, batMv=4169, batPct=100 DEBUG | ??:??:?? 1 I2C device found at address 0x34 INFO | ??:??:?? 1 axp192/axp2101 PMU found DEBUG | ??:??:?? 1 I2C device found at address 0x3c DEBUG | ??:??:?? 1 0x8 subtype probed in 2 tries INFO | ??:??:?? 1 sh1106 display found INFO | ??:??:?? 1 2 I2C devices found INFO | ??:??:?? 1 Meshtastic hwvendor=4, swver=2.0.12 DEBUG | ??:??:?? 1 Setting random seed 2179119323 DEBUG | ??:??:?? 1 Total heap: 193856 DEBUG | ??:??:?? 1 Free heap: 159512 DEBUG | ??:??:?? 1 Total PSRAM: 4192123 DEBUG | ??:??:?? 1 Free PSRAM: 4191863 DEBUG | ??:??:?? 1 NVS: UsedEntries 191, FreeEntries 439, AllEntries 630, NameSpaces 7 DEBUG | ??:??:?? 1 Setup Preferences in Flash Storage DEBUG | ??:??:?? 1 Number of Device Reboots: 48 DEBUG | ??:??:?? 1 OTA firmware version 0.2.0.1da2617 INFO | ??:??:?? 1 Initializing NodeDB INFO | ??:??:?? 1 Loading /prefs/db.proto INFO | ??:??:?? 1 Loaded saved devicestate version 20 INFO | ??:??:?? 1 Loading /prefs/config.proto INFO | ??:??:?? 1 Loaded saved config version 20 [ 1194][E][vfs_api.cpp:104] open(): /littlefs/prefs/module.proto does not exist, no permits for creation INFO | ??:??:?? 1 No /prefs/module.proto preferences found INFO | ??:??:?? 1 Installing default ModuleConfig INFO | ??:??:?? 1 Loading /prefs/channels.proto INFO | ??:??:?? 1 Loaded saved channelFile version 20 [ 1260][E][vfs_api.cpp:104] open(): /littlefs/oem/oem.proto does not exist, no permits for creation INFO | ??:??:?? 1 No /oem/oem.proto preferences found

bmidgley commented 1 year ago

In my experience, pinging the device once per minute prevents this bug from appearing. I've got powersave disabled. Maybe there is something the esp32 is still trying to do when wifi is idle that disconnects us.

image
garthvh commented 1 year ago

2.1.0 has been connected to wifi for me for days now, closing this as both WiFi and MQTT are much more stable in the latest alpha.

patrickselby commented 1 year ago

Do you know if this issue would have affected the bluetooth connection? I had some reports of frequent disconnects.