openshwprojects / OpenBK7231T_App

Open source firmware (Tasmota/Esphome replacement) for BK7231T, BK7231N, BL2028N, T34, XR809, W800/W801, W600/W601, BL602 and LN882H
https://openbekeniot.github.io/webapp/devicesList.html
1.5k stars 280 forks source link

BK7231T / MQTT Publish Error ERR_MEM #376

Closed iprak closed 2 years ago

iprak commented 2 years ago

I encountered this Publish err: ERR_MEM on BK7231T with firmware built on 2010c74d5690036103d6790f84fe57699b5a2e58.

Info:MQTT:Publishing val 10.000000 to enbrighten1/current/get retain=0
Info:MQTT:Publishing val 212.000000 to enbrighten1/power/get retain=0
Info:MQTT:Publishing val 10.855074 to enbrighten1/energycounter/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_last_hour/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_yesterday/get retain=0
Info:MQTT:Publishing val 3.862278 to enbrighten1/energycounter_today/get retain=0
Info:MQTT:Publishing val 2022-10-30 12:09:04 to enbrighten1/energycounter_clear_date/get retain=0
Info:MQTT:Publishing val 3.000000 to enbrighten1/current/get retain=0
Info:MQTT:Publishing val 144.000000 to enbrighten1/power/get retain=0
Info:MQTT:Publishing val 11.376740 to enbrighten1/energycounter/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_last_hour/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_yesterday/get retain=0
Info:MQTT:Publishing val 4.383944 to enbrighten1/energycounter_today/get retain=0
Info:MQTT:Publishing val 2022-10-30 12:09:04 to enbrighten1/energycounter_clear_date/get retain=0
Info:NTP:Seconds since Jan 1 1900 = 3876167286
Info:NTP:Unix time  : 1667178486
Info:NTP:Local Time : 2022/10/31 01:08:06
Info:MQTT:Publishing val 4.000000 to enbrighten1/current/get retain=0
Info:MQTT:Publishing val 154.000000 to enbrighten1/power/get retain=0
Info:MQTT:Publishing val 11.890072 to enbrighten1/energycounter/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_last_hour/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_yesterday/get retain=0
Info:MQTT:Publishing val 4.897277 to enbrighten1/energycounter_today/get retain=0
Info:MQTT:Publishing val 2022-10-30 12:09:04 to enbrighten1/energycounter_clear_date/get retain=0
Info:MQTT:Publish err: ERR_MEM aka -1
Info:MQTT:mqtt_connection_cb: Disconnected, reason: 256(Disconnected)

It happened again after a while and MQTT disconnected as well.

Info:NTP:Local Time : 2022/10/31 01:23:21
Info:MQTT:Publishing val 55.235130 to enbrighten1/energycounter/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_last_hour/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_yesterday/get retain=0
Info:MQTT:Publishing val 48.242329 to enbrighten1/energycounter_today/get retain=0
Info:MQTT:Publishing val 2022-10-30 12:09:04 to enbrighten1/energycounter_clear_date/get retain=0
Info:MQTT:Publishing val 6.000000 to enbrighten1/current/get retain=0
Info:MQTT:Publishing val 204.000000 to enbrighten1/power/get retain=0
Info:MQTT:Publishing val 55.820686 to enbrighten1/energycounter/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_last_hour/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_yesterday/get retain=0
Info:MQTT:Publishing val 48.827885 to enbrighten1/energycounter_today/get retain=0
Info:MQTT:Publishing val 2022-10-30 12:09:04 to enbrighten1/energycounter_clear_date/get retain=0
Info:MQTT:Publishing val 5.000000 to enbrighten1/current/get retain=0
Info:MQTT:Publishing val 170.000000 to enbrighten1/power/get retain=0
Info:MQTT:Publishing val 56.404572 to enbrighten1/energycounter/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_last_hour/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_yesterday/get retain=0
Info:MQTT:Publishing val 49.411770 to enbrighten1/energycounter_today/get retain=0
Info:MQTT:Publishing val 2022-10-30 12:09:04 to enbrighten1/energycounter_clear_date/get retain=0
Info:MQTT:Publishing val 7.000000 to enbrighten1/current/get retain=0
Info:MQTT:Publishing val 191.000000 to enbrighten1/power/get retain=0
Info:MQTT:Publishing val 57.009571 to enbrighten1/energycounter/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_last_hour/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_yesterday/get retain=0
Info:MQTT:Publishing val 50.016769 to enbrighten1/energycounter_today/get retain=0
Info:MQTT:Publishing val 2022-10-30 12:09:04 to enbrighten1/energycounter_clear_date/get retain=0
Info:MQTT:Publishing val 6.000000 to enbrighten1/current/get retain=0
Info:MQTT:Publishing val 172.000000 to enbrighten1/power/get retain=0
Info:MQTT:Publishing val 57.632900 to enbrighten1/energycounter/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_last_hour/get retain=0
Info:MQTT:Publishing val 0.000000 to enbrighten1/energycounter_yesterday/get retain=0
Info:MQTT:Publishing val 50.640099 to enbrighten1/energycounter_today/get retain=0
Info:MQTT:Publishing val 2022-10-30 12:09:04 to enbrighten1/energycounter_clear_date/get retain=0
Info:MQTT:Publish err: ERR_MEM aka -1
Info:MQTT:mqtt_connection_cb: Disconnected, reason: 256(Disconnected)
valeklubomir commented 2 years ago

ERR_MEM aka -1 means that Publish buffers are full and OS needs time to process and transmitt. As @openshwprojects mentioned old LWIP could not handle more than 3 publish per sec With BK7231N updated LWIP 2.1.3 and larger buffers, I can now handle 14-20 publish per sec. BK7231T would need the LWIP 2.1.3 changes made for BK7231N SDK. Since I still do not have any BK7231T device I can't finish mine SDK. It is availbe in my fork of OpenBK7231T. git@github.com:valeklubomir/OpenBK7231T.git But I hope today I could obtain one BK7231T.

valeklubomir commented 2 years ago

LWIP 2.1.3 added to OpenBK7231T. Created pull request.

openshwprojects commented 2 years ago

Great progress @valeklubomir , can you try again now, @iprak ?

iprak commented 2 years ago

That last change set to use LWIP 2.1.3 seems to hang BK7231T. I updated a switch to a locally built firmware and in my haste updated the 2nd to official 1.14.108 firmware. Now both fail to connect. Not sure if flash settings got overridden or something else. I will have to tear them down to access serial log. Do you have a SDK kit to try it out?

@valeklubomir @openshwprojects

iprak commented 2 years ago

Ok one device came back after doing multiple on-offs. The settings were completed wiped out. But it seems to be struggling .. I had this in the log

Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_CONNECTING
Info:MAIN:wl_status 6
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_DISCONNECTED
Info:MAIN:Time 188, idle 1400446/s, free 95864, MQTT 0(8), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:MAIN:wl_status 6
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_DISCONNECTED
Info:MAIN:wl_status 10
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_CONNECTED
Info:MAIN:wl_status 11
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_CONNECTED
Info:MAIN:Time 189, idle 701811/s, free 96184, MQTT 0(8), bWifi 1, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 190, idle 716036/s, free 96224, MQTT 0(8), bWifi 1, secondsWithNoPing -1, socks 2/38
Info:GEN:dhcp=0 ip=192.168.1.211 gate=192.168.1.2 mask=255.255.255.0 mac=c8:47:8c:00:00:00 
Info:GEN:sta: 1, softap: 0, b/g/n
Info:GEN:sta:rssi=-61,ssid=solar,bssid=14:dd:a9:d8:01:00 ,channel=10,cipher_type:
Info:MQTT:mqtt_host empty, not starting mqtt
Info:MAIN:Time 191, idle 699063/s, free 96224, MQTT 0(9), bWifi 1, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 192, idle 1406597/s, free 96224, MQTT 0(9), bWifi 1, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 193, idle 710212/s, free 95992, MQTT 0(9), bWifi 1, secondsWithNoPing -1, socks 3/38

And then nothing changes for 10 seconds and then suddenly got this. Note that Info: Main 194-200 are missing.

Info:MAIN:Time 193, idle 710212/s, free 95992, MQTT 0(9), bWifi 1, secondsWithNoPing -1, socks 3/38
N:sta:rssi=-61,ssid=solar,bssid=14:dd:a9:d8:01:00 ,channel=10,cipher_type:
Info:MAIN:Time 201, idle 711651/s, free 95992, MQTT 0(10), bWifi 1, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 202, idle 713151/s, free 95992, MQTT 0(10), bWifi 1, secondsWithNoPing -1, socks 2/38
Info:MAIN:wl_status 5
Info:MAIN:wl_status 9
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_AUTH_FAILED
Info:MAIN:Time 203, idle 763556/s, free 95520, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 204, idle 803200/s, free 95520, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 205, idle 803524/s, free 95520, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 206, idle 803423/s, free 95520, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 207, idle 803592/s, free 95520, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 208, idle 802857/s, free 97264, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 209, idle 803722/s, free 97264, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 210, idle 783391/s, free 97264, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:GEN:dhcp=0 ip=0.0.0.0 gate=0.0.0.0 mask=0.0.0.0 mac=c8:47:8c:00:00:00 
Info:GEN:sta: 0, softap: 0, b/g/n
Info:MAIN:Time 211, idle 717899/s, free 97264, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:MAIN:Time 212, idle 717745/s, free 97264, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
Info:MAIN:Main_OnWiFiStatusChange - WIFI_STA_DISCONNECTED
Info:MAIN:Time 213, idle 719858/s, free 100032, MQTT 0(10), bWifi 0, secondsWithNoPing -1, socks 2/38
iprak commented 2 years ago

Could the LFS changes (https://github.com/openshwprojects/OpenBK7231T_App/pull/382) have something to do with this? It is documented as a breaking change under 1.14.107

valeklubomir commented 2 years ago

Let's check. I have WB2L module for testing. Should be BK7231T. Did not seen similar beviour. Actually I think BK7231T is little better then BK7231N. I have at least 16KB more memory available. But actually I am testing module solo module only connected to ESP-PROG nothing else. Keep it running on UART logging. I have been testing it yesterday for several hours. 13 succesfull OTA. Connecting to your wifi is problem, module show rssi=-61. Which is scaled as 'Fair'. The lost Info: Main 194-200 were just takes by Wifi Library when it tried to connect. I have been seeing when it tries to connect that BK7231 is silent for few second. Length depends on signal strength. Even my devices which have low RSSI communicate slower.

EDIT: I am using only firmware which I build myself. I have it all automated. Start build and execute OTA.

valeklubomir commented 2 years ago

LFS changes did not had effect on my device.

valeklubomir commented 2 years ago

Added command "publishBenchmark" which sends 1000 msg to topic /benchmark

Topic: LightBulb66B38400/benchmark QOS: 2 Data: Benchmark completed. 1000 msg published. Total Time: 50 s MsgRate: 19 msg/s

EDIT: counted them all 1000 msg recevied correctly.

btsimonh commented 2 years ago

Could the LFS changes (#382) have something to do with this? It is documented as a breaking change under 1.14.107

It breaks existing LFS only - i.e. you lose your LFS files because it tries to mount the default (smaller) size LFS which starts at a different block.

openshwprojects commented 2 years ago

@valeklubomir @iprak I am working on WB3S device (LED dimmer) and extending scripting features and so far I have no issues. Will look into it right now more. EDIT: Sorry, I am not able to detect any issues so far, but I will test more.

iprak commented 2 years ago

I don't think it is WiFi. The device has been plugged at the same location for a month now and this immediately happened after I updated it. This device was however somewhere around 1.14.71 before this which is from a while ago. The device can be consistently reached via ping.

iprak commented 2 years ago

I might be running into https://github.com/openshwprojects/OpenBK7231T_App/issues/387 .. My devices are set to autostart some drivers.

openshwprojects commented 2 years ago

I think that @valeklubomir change improved ERR_MEM situation. I will close this for now.