openshwprojects / OpenBK7231T_App

Open source firmware (Tasmota/Esphome replacement) for BK7231T, BK7231N, BL2028N, T34, XR809, W800/W801, W600/W601, BL602 and LN882H
https://openbekeniot.github.io/webapp/devicesList.html
1.48k stars 274 forks source link

Resolve MQTT DNS asynchronously #1298

Closed giedriuslt closed 3 months ago

giedriuslt commented 3 months ago

This makes use of asynchronous dns resolution in lwip, fixes issues with restarts if internet breaks. tested on bk7231n and bl602. Fixes #1284. XR809 uses very old lwip so there is no way to ask for ipv4 ip, but it probably does not support ipv6 anyway, so should be no issue

openshwprojects commented 3 months ago

It seems promising, is anyone able to test? @divadiow ? @MaxineMuster ?

divadiow commented 3 months ago

im afraid im away now until 5th August

MaxineMuster commented 3 months ago

I don't use MQTT usually, so I don't know exactly what to test.

I installed Mosquitto and can confirm that all my devices (LN882H, BK7231N and W801) do connect to MQTT via hostname.

Log shows this for all devices:

OpenBK7231N

Info:MQTT:mqtt_userName homeassistant
mqtt_pass qqqqqqqqqq
mqtt_clientID obkAABBCCDD
mqtt_host max-test.fritz.box:1883
Info:MQTT:mqtt_host max-test.fritz.box is being resolved by gethostbyname
Info:MQTT:mqtt_host max-test.fritz.box resolution SUCCESS
Info:MQTT:mqtt_userName homeassistant
mqtt_pass qqqqqqqqqq
mqtt_clientID obkAABBCCDD
mqtt_host max-test.fritz.box:1883
Info:MQTT:mqtt_connection_cb: Successfully connected

OpenLN882H

Info:MQTT:mqtt_userName homeassistant
mqtt_pass qqqqqqqqqq
mqtt_clientID ln882hAABBCCDD
mqtt_host max-test.fritz.box:1883
Info:MQTT:mqtt_host max-test.fritz.box is being resolved by gethostbyname
Info:MQTT:mqtt_host max-test.fritz.box resolution SUCCESS
Info:MQTT:mqtt_userName homeassistant
mqtt_pass qqqqqqqqqq
mqtt_clientID ln882hAABBCCDD
mqtt_host max-test.fritz.box:1883
Info:MQTT:mqtt_connection_cb: Successfully connected

OpenW800

Info:MQTT:mqtt_userName homeassistant
mqtt_pass qqqqqqqqqq
mqtt_clientID w800AABBCCDD
mqtt_host max-test.fritz.box:1883
Info:MQTT:mqtt_host max-test.fritz.box is being resolved by gethostbyname
Info:MQTT:mqtt_host max-test.fritz.box resolution SUCCESS
Info:MQTT:mqtt_userName homeassistant
mqtt_pass qqqqqqqqqq
mqtt_clientID w800AABBCCDD
mqtt_host max-test.fritz.box:1883
Info:MQTT:mqtt_connection_cb: Successfully connected
giedriuslt commented 3 months ago

Also works on w600:

Info:MQTT:mqtt_userName mosquitto
mqtt_pass mosquitto
mqtt_clientID w600CD06D193
mqtt_host homeassistant.lan:1883
Info:MQTT:mqtt_host homeassistant.lan is being resolved by gethostbyname
Info:MQTT:mqtt_host homeassistant.lan resolution SUCCESS
Info:MQTT:mqtt_userName mosquitto
mqtt_pass mosquitto
mqtt_clientID w600CD06D193
mqtt_host homeassistant.lan:1883
Info:MQTT:mqtt_connection_cb: Successfully connected
diepeterpan commented 3 months ago

I have now installed this build _"1298_merge0b44b8f5db05" on a BL602 as I had a daily WATCHDOG reboot, https://github.com/openshwprojects/OpenBK7231T_App/actions/runs/10123274891

So far no problems, but too early to tell if it fixed my BL602 problem which I am hoping is related.

giedriuslt commented 3 months ago

Can we merge this?

diepeterpan commented 3 months ago

I have now installed this build _"1298_merge0b44b8f5db05" on a BL602 as I had a daily WATCHDOG reboot, https://github.com/openshwprojects/OpenBK7231T_App/actions/runs/10123274891

So far no problems, but too early to tell if it fixed my BL602 problem which I am hoping is related.

It did not fix my BL_RST_SOFTWARE_WATCHDOG problem, but no other problems were experienced with the patch.

openshwprojects commented 3 months ago

So it's ready to merge, let's say, tomorrow?

giedriuslt commented 3 months ago

Should be. Works fine for a week already with both bk7231n and bl602

giedriuslt commented 3 months ago

I have now installed this build _"1298_merge0b44b8f5db05" on a BL602 as I had a daily WATCHDOG reboot, https://github.com/openshwprojects/OpenBK7231T_App/actions/runs/10123274891 So far no problems, but too early to tell if it fixed my BL602 problem which I am hoping is related.

It did not fix my BL_RST_SOFTWARE_WATCHDOG problem, but no other problems were experienced with the patch.

What drivers do you run? BL_RST_SOFTWARE_WATCHDOG on bl602 can be because of some blocking of main thread as it is with this, however it can also be if something else crashes.

diepeterpan commented 3 months ago

I have now installed this build _"1298_merge0b44b8f5db05" on a BL602 as I had a daily WATCHDOG reboot, https://github.com/openshwprojects/OpenBK7231T_App/actions/runs/10123274891 So far no problems, but too early to tell if it fixed my BL602 problem which I am hoping is related.

It did not fix my BL_RST_SOFTWARE_WATCHDOG problem, but no other problems were experienced with the patch.

What drivers do you run? BL_RST_SOFTWARE_WATCHDOG on bl602 can be because of some blocking of main thread as it is with this, however it can also be if something else crashes.

"0 drivers active, total 13" not sure how to list the 13 drivers, not sure it's relevant since 0 are active.

giedriuslt commented 3 months ago

That means nothing extra is running, makes it even more curious what could be causing those reboots

diepeterpan commented 3 months ago

That means nothing extra is running, makes it even more curious what could be causing those reboots

Besides that I now discovered that this change is not really solving MQTT name resolution, my MQTT was down for an hour and when re-started my MQTT server the BL602 is just in an endless name resolution for the MQTT name without reconnecting.

_Info:MQTT:mqtt_userName ??????? mqtt_pass ??????? mqtt_clientID tnl94390DAX mqtt_host mqtt.domain.co.za:1883 Info:MQTT:mqtthost mqtt.domain.co.za is being resolved by gethostbyname

and that repeats and on the homepage is says

_MQTT State: disconnected RES: 0(ERROK) MQTT ErrMsg:

Reverted to version 1.17.651

giedriuslt commented 3 months ago

Looks like callback is never called in your case. I guess I need to time it

divadiow commented 3 months ago

is there anything you want me to try?

giedriuslt commented 3 months ago

I know changed the wait for dns to be limited to 10s. Hopefully solves issue @diepeterpan is seeing

diepeterpan commented 3 months ago

That means nothing extra is running, makes it even more curious what could be causing those reboots

I suspect it is DHCP renewal based on the timing, I will assign a fixed IP and capture it in the BL602 web interface to see if it still gives a BL_RST_SOFTWARE_WATCHDOG error daily. For now, I will not test this MQTT change until I conclude the fixed IP test.

BTW. Maybe just me, but it seems like it is not using the "Configure Names" to register in DHCP, it does not matter what I set it to it used _BouffalolabBL602-390daf. Even if I set flag Flag 29 - [NETIF] Use short device name as a hostname instead of a long name the name _BouffalolabBL602-390daf is registered. Anybody else that can test my wild accusations?

Looking at the source the BL602 SDK is not calling CFG_GetOpenBekenHostName whereas the BK7231N SDK does.

https://github.com/search?q=repo%3Aopenshwprojects%2FOpenBK7231N%20CFG_GetOpenBekenHostName&type=code

https://github.com/search?q=repo%3Aopenshwprojects%2FOpenBL602%20CFG_GetOpenBekenHostName&type=code

giedriuslt commented 3 months ago

custom host nameis not implemented on bl602

diepeterpan commented 3 months ago

BL_RST_SOFTWARE_WATCHDOG

When I change to a fixed IP my BL_RST_SOFTWARE_WATCHDOG error disappears on version 1.17.651; I am connecting to an OpenWrt router with a default DHCP timeout of 12 hours and always had BL_RST_SOFTWAREWATCHDOG around the 12-hour mark. So the DHCP renewal/reclaim timeout seems to be not working/correctly. Is anybody else having a similar experience? (PS. I am still not yet testing the MQTT change, want to run the current stability test a bit longer on version 1.17.651)_

giedriuslt commented 3 months ago

Works fine with the same 12h lease time , could be something about particular configuration/hardware. Try maybe increasing lease time on dhcp server and see if reboots follow? I'm also using openwrt 22.03.

giedriuslt commented 3 months ago

Also in router log I see that socket is sending DHCPREQUEST message like every 4-6 hours, so lease should never expire

diepeterpan commented 3 months ago

Works fine with the same 12h lease time , could be something about particular configuration/hardware. Try maybe increasing lease time on dhcp server and see if reboots follow? I'm also using openwrt 22.03.

I am not changing my DHCP server for just one device, I have 50 other devices running fine for more than 2 years. If it is stable with fixed IP then that is what I will have to do for Openbeken on BL602. Thanks for the suggestions.

giedriuslt commented 3 months ago

Works fine with the same 12h lease time , could be something about particular configuration/hardware. Try maybe increasing lease time on dhcp server and see if reboots follow? I'm also using openwrt 22.03.

I am not changing my DHCP server for just one device, I have 50 other devices running fine for more than 2 years. If it is stable with fixed IP then that is what I will have to do for Openbeken on BL602. Thanks for the suggestions.

You don't need to change DHCP server for that, you can just edit lease time. Or yuo can set static lease for one particular device and set lease time on that

diepeterpan commented 3 months ago

Works fine with the same 12h lease time , could be something about particular configuration/hardware. Try maybe increasing lease time on dhcp server and see if reboots follow? I'm also using openwrt 22.03.

I am not changing my DHCP server for just one device, I have 50 other devices running fine for more than 2 years. If it is stable with fixed IP then that is what I will have to do for Openbeken on BL602. Thanks for the suggestions.

You don't need to change DHCP server for that, you can just edit lease time. Or yuo can set static lease for one particular device and set lease time on that

Thx for the tip, I will change back to DHCP on device and set Openwrt for the specific reservation of MAC/IP to 3 hours and test.

diepeterpan commented 3 months ago

Works fine with the same 12h lease time , could be something about particular configuration/hardware. Try maybe increasing lease time on dhcp server and see if reboots follow? I'm also using openwrt 22.03.

I am not changing my DHCP server for just one device, I have 50 other devices running fine for more than 2 years. If it is stable with fixed IP then that is what I will have to do for Openbeken on BL602. Thanks for the suggestions.

You don't need to change DHCP server for that, you can just edit lease time. Or yuo can set static lease for one particular device and set lease time on that

Thx for the tip, I will change back to DHCP on device and set Openwrt for the specific reservation of MAC/IP to 3 hours and test.

It renewed after 3 hours the reserved IP based on MAC address without problem, one thing I noticed is that OpenWrt does not accept DNS names with an _ 'underscore' during capturing of a DHCP reservation, e.g. Bouffalolab_BL602-390daf so I had to change the reservation to Bouffalolab-BL602-390daf. Not sure whether that plays a role during renewal without a reservation causing a BL_RST_SOFTWARE_WATCHDOG. Feels like I am hunting for a needle in a haystack. Let's see after 24 hours if still good.

diepeterpan commented 3 months ago

Works fine with the same 12h lease time , could be something about particular configuration/hardware. Try maybe increasing lease time on dhcp server and see if reboots follow? I'm also using openwrt 22.03.

I am not changing my DHCP server for just one device, I have 50 other devices running fine for more than 2 years. If it is stable with fixed IP then that is what I will have to do for Openbeken on BL602. Thanks for the suggestions.

You don't need to change DHCP server for that, you can just edit lease time. Or yuo can set static lease for one particular device and set lease time on that

Thx for the tip, I will change back to DHCP on device and set Openwrt for the specific reservation of MAC/IP to 3 hours and test.

It renewed after 3 hours the reserved IP based on MAC address without problem, one thing I noticed is that OpenWrt does not accept DNS names with an _ 'underscore' during capturing of a DHCP reservation, e.g. Bouffalolab_BL602-390daf so I had to change the reservation to Bouffalolab-BL602-390daf. Not sure whether that plays a role during renewal without a reservation causing a BL_RST_SOFTWARE_WATCHDOG. Feels like I am hunting for a needle in a haystack. Let's see after 24 hours if still good.

Got a BL_RST_SOFTWARE_WATCHDOG within 12 hours of changing to a 3 hour reserved DHCP from fixed IP. I am going to back to a fixed IP for now.

diepeterpan commented 3 months ago

I now have a second device which I test on, 0 drivers active and without MQTT setup on the BL602. Using a 3 hour reserved DHCP entry I have received a BL_RST_SOFTWARE_WATCHDOG error within 6 hours.

On the OpenWrt router this is what I could see for the BL602 device when it BL_RST_SOFTWARE_WATCHDOGged on me.

Fri Aug 9 14:46:04 2024 daemon.notice hostapd: wl0-ap0: AP-STA-DISCONNECTED 24:94:94:39:12:4d Fri Aug 9 14:46:04 2024 daemon.info hostapd: wl0-ap0: STA 24:94:94:39:12:4d IEEE 802.11: authenticated Fri Aug 9 14:46:04 2024 daemon.info hostapd: wl0-ap0: STA 24:94:94:39:12:4d IEEE 802.11: associated (aid 18) Fri Aug 9 14:46:04 2024 daemon.notice hostapd: wl0-ap0: AP-STA-CONNECTED 24:94:94:39:12:4d auth_alg=open Fri Aug 9 14:46:04 2024 daemon.info hostapd: wl0-ap0: STA 24:94:94:39:12:4d WPA: pairwise key handshake completed (RSN) Fri Aug 9 14:46:04 2024 daemon.notice hostapd: wl0-ap0: EAPOL-4WAY-HS-COMPLETED 24:94:94:39:12:4d Fri Aug 9 14:46:06 2024 daemon.info dnsmasq-dhcp[1]: DHCPDISCOVER(br-lan) 24:94:94:39:12:4d Fri Aug 9 14:46:06 2024 daemon.info dnsmasq-dhcp[1]: DHCPOFFER(br-lan) 172.16.1.226 24:94:94:39:12:4d Fri Aug 9 14:46:06 2024 daemon.info dnsmasq-dhcp[1]: DHCPDISCOVER(br-lan) 24:94:94:39:12:4d Fri Aug 9 14:46:06 2024 daemon.info dnsmasq-dhcp[1]: DHCPOFFER(br-lan) 172.16.1.226 24:94:94:39:12:4d Fri Aug 9 14:46:06 2024 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(br-lan) 172.16.1.226 24:94:94:39:12:4d Fri Aug 9 14:46:06 2024 daemon.info dnsmasq-dhcp[1]: DHCPACK(br-lan) 172.16.1.226 24:94:94:39:12:4d Bouffalolab-BL602-39124d

If this is isolated to my environment then I will run with a fixed IP on the BL602 and just make a note meself for these two for now to run on fixed IP.

diepeterpan commented 3 months ago

I know changed the wait for dns to be limited to 10s. Hopefully solves issue @diepeterpan is seeing

Ok, getting back to this on the second device, configured it with a fixed IP and connected to test MQTT server which I can switch off and on and installed Openbeken Built on Aug 6 2024 17:28:02 version 1298_merge_cf69b4d9b27d.

Will feedback in 24 hours.

diepeterpan commented 3 months ago

I know changed the wait for dns to be limited to 10s. Hopefully solves issue @diepeterpan is seeing

Ok, getting back to this on the second device, configured it with a fixed IP and connected to test MQTT server which I can switch off and on and installed Openbeken Built on Aug 6 2024 17:28:02 version 1298_merge_cf69b4d9b27d.

Will feedback in 24 hours.

Switched off my MQTT server for more than an hour, got a BL_RST_SOFTWARE_WATCHDOG on Openbeken Built on Aug 6 2024 17:28:02 version 1298_merge_cf69b4d9b27d. I am going to run the same test on version 1.17.653 and see whether there's a difference e.g. is this change actually improving anything, at least for me?

diepeterpan commented 3 months ago

I know changed the wait for dns to be limited to 10s. Hopefully solves issue @diepeterpan is seeing

Ok, getting back to this on the second device, configured it with a fixed IP and connected to test MQTT server which I can switch off and on and installed Openbeken Built on Aug 6 2024 17:28:02 version 1298_merge_cf69b4d9b27d. Will feedback in 24 hours.

Switched off my MQTT server for more than an hour, got a BL_RST_SOFTWARE_WATCHDOG on Openbeken Built on Aug 6 2024 17:28:02 version 1298_merge_cf69b4d9b27d. I am going to run the same test on version 1.17.653 and see whether there's a difference e.g. is this change actually improving anything, at least for me?

I don't see any difference in using this merge build and current version 1.17.653; tested the same scenarios with MQTT and same problem of BL_RST_SOFTWARE_WATCHDOG when MQTT server is down for extended period e.g. 40 - 70 min.

giedriuslt commented 3 months ago

This was never intended to fix daily reboots and it was always a stretch that it's the same issue, It's very likely not. What this fixes is reboot caused by slow dns resolution due to internet outage. And it fixes that. My second commit also fixes the infinite wait if calback for dns resolution never happens. Did you get the same

Info:MQTT:mqtt_userName ???????
mqtt_pass ???????
mqtt_clientID tnl94390DAX
mqtt_host mqtt.domain.co.za:1883
Info:MQTT:mqtt_host mqtt.domain.co.za is being resolved by gethostbyname

and that repeats and on the homepage is says

with cf69b4d9b27d build?

diepeterpan commented 3 months ago

cf69b4d

Then I don't experience the issue this patch is trying to solve, sorry for wasting your time.

giedriuslt commented 3 months ago

Yes, your issue seems different. If you can provide serial log (not the log from the browser) of the restart in a new bug, then maybe we could try to find the cause. But without that it's not much we can do.

diepeterpan commented 3 months ago

Yes, your issue seems different. If you can provide serial log (not the log from the browser) of the restart in a new bug, then maybe we could try to find the cause. But without that it's not much we can do.

Understood, maybe some day I will have the energy to disassemble the bulb again and solder wires for the serial log. For now a fixed IP seems to improve stability and it's rare that my network or MQTT is down. Thanks for the offer.

PS. Tested more on my fixed IP theory and that's also down the drain after 12 hours. :-(

openshwprojects commented 3 months ago

@giedriuslt Sorry, I am out of the loop. Is the "callback not getting called" issue fixed now?

giedriuslt commented 3 months ago

Yes, that should be fixed.

openshwprojects commented 3 months ago

Thank you, so let's give it a chance.

github-actions[bot] commented 3 months ago

:tada: This PR is included in version 1.17.655 :tada:

The release is available on GitHub release

Your semantic-release bot :package::rocket:

restoration2699 commented 3 months ago

655 release did not help. still get BL_RST_SOFTWARE_WATCHDOG on BL602.

diepeterpan commented 3 months ago

I am also running version 1.17.655 on a seperate BL602 with serial logging being captured hoping to capture a BL_RST_SOFTWARE_WATCHDOG, crossing fingers to see whether we can get to the root of it. Will keep you posted if and when I catch da’DOG.

diepeterpan commented 3 months ago

I am also running version 1.17.655 on a seperate BL602 with serial logging being captured hoping to capture a BL_RST_SOFTWARE_WATCHDOG, crossing fingers to see whether we can get to the root of it. Will keep you posted if and when I catch da’DOG.

Got a WATCHDOG error and this is the serial log, 100 lines before and after, does not look like it contains anything relevant. Anything I can do to get more information that might help?

Info:MAIN:Time 8064, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8065, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8066, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8067, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8068, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8069, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8070, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8071, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8072, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8073, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8074, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8075, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8076, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8077, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8078, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8079, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8080, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8081, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8082, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8083, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8084, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8085, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8086, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8087, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8088, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8089, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8090, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8091, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8092, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8093, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8094, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8095, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8096, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8097, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8098, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8099, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8100, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8101, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8102, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8103, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8104, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8105, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8106, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8107, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:Publishing val 41.66 to obl94F28B35/temp retain=0
Info:MAIN:Time 8108, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
-----------------> AABA Request:
    A-MSDU: Permitted
    Block Ack Policy: Immediate Block Ack
    TID: 0
    Number of Buffers: 64
-----------------> AABA Response:
    A-MSDU: Not Permitted
    Block Ack Policy: Immediate Block Ack
    TID: 0
    Number of Buffers: 8
Info:MQTT:Publishing val AccessPoint to obl94F28B35/ssid retain=0
Info:MAIN:Time 8109, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:Publishing val 2 to obl94F28B35/sockets retain=0
Info:MAIN:Time 8110, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:Publishing val -49 to obl94F28B35/rssi retain=0
Info:MAIN:Time 8111, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:Publishing val 8111 to obl94F28B35/uptime retain=0
Info:MAIN:Time 8112, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:Publishing val 95368 to obl94F28B35/freeheap retain=0
Info:MAIN:Time 8113, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:Publishing val 10.0.0.143 to obl94F28B35/ip retain=0
Info:MAIN:Time 8114, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:Publishing val 0 to obl94F28B35/led_enableAll/get retain=0
Info:MAIN:Time 8115, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:MQTT client in mqtt_incoming_publish_cb topic obl94F28B35/led_enableAll/get
Info:MQTT:Publishing val FFFFFF to obl94F28B35/led_basecolor_rgb/get retain=0
Info:MAIN:Time 8116, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:MQTT client in mqtt_incoming_publish_cb topic obl94F28B35/led_basecolor_rgb/get
Info:MQTT:Publishing val 100 to obl94F28B35/led_dimmer/get retain=0
Info:MAIN:Time 8117, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MQTT:MQTT client in mqtt_incoming_publish_cb topic obl94F28B35/led_dimmer/get
Info:MAIN:Time 8118, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8119, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8120, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8121, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
Info:MAIN:Time 8122, idle 0/s, free 95368, MQTT 1(2), bWifi 1, secondsWithNoPing -1, socks 2/21 
g bl602 now....
Booting BL602 Chip...
██████╗ ██╗      ██████╗  ██████╗ ██████╗
██╔══██╗██║     ██╔════╝ ██╔═████╗╚════██╗
██████╔╝██║     ███████╗ ██║██╔██║ █████╔╝
██╔══██╗██║     ██╔═══██╗████╔╝██║██╔═══╝
██████╔╝███████╗╚██████╔╝╚██████╔╝███████╗
╚═════╝ ╚══════╝ ╚═════╝  ╚═════╝ ╚══════╝

------------------------------------------------------------
RISC-V Core Feature:RV32-ACFIMX
Build Version:      release_bl_iot_sdk_1.6.22-22-g1d4ff804-dirty
Std Driver Version: 541807d
PHY   Version:      a0_final-73-g62481a0
RF    Version:      79cc6b9
Build Date:         Aug 12 2024
Build Time:         19:42:33
Boot Reason:        BL_RST_SOFTWARE_WATCHDOG
------------------------------------------------------------
Heap 126896@0x42023050, 5888@0x4204a900
Boot2 consumed 3ms

blog init set power on level 2, 2, 2.
[IRQ] Clearing and Disable all the pending IRQ...
------------------TRNG TEST---------------------------------
**********TRNG TEST rand[61b8a4ab]**************
**********TRNG TEST rand[373d6c9d]**************
------------------------------------------------------------
[         0][INFO  : hal_boot2.c: 285] [HAL] [BOOT2] Active Partition[0] consumed 596 Bytes
[         0][INFO  : hal_boot2.c:  82] ======= PtTable_Config @0x4200f0a4=======
[         0][INFO  : hal_boot2.c:  83] magicCode 0x54504642; version 0x0000; entryCnt 7; age 2; crc32 0xB8D652AD
[         0][INFO  : hal_boot2.c:  89] idx  type device activeIndex     name   Address[0]  Address[1]  Length[0]   Length[1]   age
[         0][INFO  : hal_boot2.c:  91] [00]  00     0         0            FW  0x00010000  0x000d8000  0x000c8000  0x00088000  2
[         0][INFO  : hal_boot2.c:  91] [01]  02     0         0           mfg  0x00160000  0x00000000  0x00032000  0x00000000  0
[         0][INFO  : hal_boot2.c:  91] [02]  03     0         0         media  0x00192000  0x00000000  0x00057000  0x00000000  0
[         0][INFO  : hal_boot2.c:  91] [03]  04     0         0           PSM  0x001e9000  0x00000000  0x00008000  0x00000000  0
[         0][INFO  : hal_boot2.c:  91] [04]  05     0         0           KEY  0x001f1000  0x00000000  0x00002000  0x00000000  0
[         0][INFO  : hal_boot2.c:  91] [05]  06     0         0          DATA  0x001f3000  0x00000000  0x00005000  0x00000000  0
[         0][INFO  : hal_boot2.c:  91] [06]  07     0         0       factory  0x001f8000  0x00000000  0x00007000  0x00000000  0
[         0][INFO  : bl_flash.c:  96] ======= FlashCfg magiccode @0x42049c18, code 0x47464346 =======
[         0][INFO  : bl_flash.c:  97] mid               0xEF
[         0][INFO  : bl_flash.c:  98] clkDelay  0x1
[         0][INFO  : bl_flash.c:  99] clkInvert         0x1
[         0][INFO  : bl_flash.c: 100] sector size       4KBytes
[         0][INFO  : bl_flash.c: 101] page size 256Bytes
[         0][INFO  : bl_flash.c: 102] ---------------------------------------------------------------
[         0][INFO  : hal_board.c:1249] [MAIN] [BOARD] [FLASH] addr from partition is 001f8000, ret is 0
[         0][INFO  : hal_board.c:1257] [MAIN] [BOARD] [XIP] addr from partition is 231e7000, ret is 0
[         0][INFO  : hal_board.c: 208] MAC address mode length 3
[         0][INFO  : hal_board.c: 212] MAC address mode is MBF
[         0][INFO  : hal_board.c: 194] Set MAC addrress 24:94:94:F2:8B:35
[         0][INFO  : hal_board.c: 955] country_code : 86
[         0][INFO  : hal_board.c: 323] xtal_mode length 2
[         0][INFO  : hal_board.c: 327] xtal_mode is MF
[         0][INFO  : hal_board.c: 251] xtal dtb in DEC :36 36 0 60 60
[         0][INFO  : hal_board.c: 986] channel_div_table :
[         0][BUF: hal_board.c: 987] c4 4e ec 1e cb b7 fc 1e d2 20 0d 1f d8 89 1d 1f df f2 2d 1f e5 5b 3e 1f ec c4 4e 1f f2 2d 5f 1f f9 96 6f 1f 00 00 80 1f 06 69 90 1f 0d d2 a0 1f 13 3b 
[         0][BUF: hal_board.c: 987] b1 1f 89 9d d8 1f f8 81 1f 20 
[         0][INFO  : hal_board.c: 997] channel_cnt_table :
[         0][BUF: hal_board.c: 998] 8a a7 e3 a7 3c a8 95 a8 ed a8 46 a9 9f a9 f8 a9 51 aa aa aa 03 ab 5c ab b5 ab 8a ac 4d 46 00 00 44 d8 09 23 c0 2f 02 42 00 c0 09 23 00 c0 09 23 00 c0 
[         0][BUF: hal_board.c: 998] 09 23 00 c0 09 23 
[         0][INFO  : hal_board.c:1006] lo_fcal_div : 1387
[         0][INFO  : hal_board.c:1025] pwr_table_11b :20 20 20 18
[         0][INFO  : hal_board.c:1045] pwr_table_11g :18 18 18 18 18 18 14 14
[         0][INFO  : hal_board.c:1065] pwr_table_11n :18 18 18 18 18 16 14 14
[         0][INFO  : hal_board.c: 523] pwr_mode length 2
[         0][INFO  : hal_board.c: 527] pwr_mode is bf
[         0][INFO  : hal_board.c: 415] pwr_offset from dtb:
[         0][BUF: hal_board.c: 416] 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a 
[         0][INFO  : hal_board.c: 421] pwr_offset from dtb (rebase on 10):
[         0][BUF: hal_board.c: 504]   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
[         0][INFO  : hal_board.c:1086] set pwr_table_ble = 13 in dts
[         0][INFO  : hal_board.c: 687] ap_ssid string[0] = bl_test_005, ap_ssid_len = 11
[         0][INFO  : hal_board.c: 698] ap_psk string[0] = 12345678, ap_psk_len = 8
[         0][INFO  : hal_board.c: 707] ap_channel = 11
[         0][INFO  : hal_board.c: 635] [STA] ap_ssid string[0] = yourssid, ap_ssid_len = 8
[         0][INFO  : hal_board.c: 646] [STA] ap_psk string[0] = yourapssword, ap_psk_len = 12
[         0][INFO  : hal_board.c: 654] auto_connect_enable = 0
sizeof(StackType_t)=4
[OS] Starting aos_loop_proc task...
[OS] Starting TCP/IP Stack...
-------------------->>>>>>>> LWIP tcp_port 60873
[OS] Starting OS Scheduler...
[BL] [SEC] TRNG Handler
=== 32 task inited
====== bloop dump ======
  bitmap_evt 0
  bitmap_msg 0
--->>> timer list:
  32 task:
    task[31] : SYS [built-in]
      evt handler 0x2306a742, msg handler 0x2306a712, trigged cnt 0, bitmap async 0 sync 0, time consumed 0us acc 0ms, max 0us
    task[30] : empty
    task[29] : empty
    task[28] : empty
    task[27] : empty
    task[26] : empty
    task[25] : empty
    task[24] : empty
    task[23] : empty
    task[22] : empty
    task[21] : empty
    task[20] : empty
    task[19] : empty
    task[18] : empty
    task[17] : empty
    task[16] : empty
    task[15] : empty
    task[14] : empty
    task[13] : empty
    task[12] : empty
    task[11] : empty
    task[10] : empty
    task[09] : empty
    task[08] : empty
    task[07] : empty
    task[06] : empty
    task[05] : empty
giedriuslt commented 3 months ago

Well this indicates that something is taking long time (>4s) in the main loop and watchdog reboots it. This is very similar as what I had to find issue with mqtt reboots with lost wifi, but in that case reboot happened after some mqtt connection info was printed, so it was rather easy to trace. Now there is nothing to go by... I can only think of adding more logging to the main loop, and hopefully that points to the cause.

diepeterpan commented 3 months ago

Well this indicates that something is taking long time (>4s) in the main loop and watchdog reboots it. This is very similar as what I had to find issue with mqtt reboots with lost wifi, but in that case reboot happened after some mqtt connection info was printed, so it was rather easy to trace. Now there is nothing to go by... I can only think of adding more logging to the main loop, and hopefully that points to the cause.

Thank you, I will gladly run a test build and see if we can catch it again with more logging. Should we open a new bug to try and trace the BL_RST_SOFTWARE_WATCHDOG?

giedriuslt commented 2 months ago

Created a build which traces most of main loop functionality to serial console https://github.com/openshwprojects/OpenBK7231T_App/pull/1322 Would be great if somebody experiencing reboots could run this and send log around the reboot (It will probably get quite big in total)