jziolkowski / tdm

GUI application to discover and monitor devices flashed with https://github.com/arendst/tasmota
GNU General Public License v3.0
667 stars 84 forks source link

TDM crashes ESP32 device that is also running berry scripts #190

Closed ElksInNC closed 2 years ago

ElksInNC commented 2 years ago

ESP32 Solo1 running latest Tasmota (11.0.0.4) TDM 0.2.11 on M1 Mac

Conditions ESP32 Solo1 Wall Switch also running BLE scanner berry script (https://github.com/tony-fav/tasmota-blerry)

When TDM starts up - or when device reboots and TDM does its initial polling of the device, the volume and rapidity of the requests overwhelms the ESP and causes a crash.

Device will happily run BLERRY for days without issue. Device will happily respond to TDM on boot or anytime TDM starts without Blerry. Device will - most of the time - continue to run if Blerry is running and stable when TDM does its initial polling. (it seems that there are critical times in the blerry script cycle that can overload if the TDM requests happen at exactly the same time.) Device will consistently crash and frequently boot loop on start if TDM is already running .

Author of Blerry and I have both encountered this issue.

It appears that on boot, the berry scripting module loading followed by the blerry script loading combined with the high volume TDM requests at broker connect are just too much for the CPU.

We can mitigate the bootloop / startup crash by delaying the launch of blerry for a period of time after boot to allow for the chance that TDM will ask for all of its data a boot.

A more elegant solution that would let these two processes peacefully co-exist without making the Solo1 faint would be to provide some sort of pacing for all of the MQTT requests that TDM makes upon first contact with a device. TDM asks a lot of the device in the first 5 seconds or so after contact.

I can provide more detailed logs from a serial connection if needed (the below is weblog from a production switch - but I have others that can be set up in a testbench mode with serial logging.

16:49:34.122 CMD: Grp 0, Cmd 'STATUS', Idx 1, Len 1, Pld 0, Data '0'
16:49:34.136 MQT: stat/LINKIND-S01/STATUS = {"Status":{"Module":0,"DeviceName":"LINKIND-S01","FriendlyName":["Mudroom Lights - Motion"],"Topic":"LINKIND-S01","ButtonTopic":"0","Power":0,"PowerOnState":3,"LedState":0,"LedMask":"FFFF","SaveData":1,"SaveState":1,"SwitchTopic":"","SwitchMode":[1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"ButtonRetain":0,"SwitchRetain":0,"SensorRetain":0,"PowerRetain":0,"InfoRetain":0,"StateRetain":0}}
16:49:34.252 MQT: stat/LINKIND-S01/STATUS1 = {"StatusPRM":{"Baudrate":115200,"SerialConfig":"8N1","GroupTopic":"tasmotas","OtaUrl":"http://ota.tasmota.com/tasmota32/release/tasmota32.bin","RestartReason":"Software reset CPU","Uptime":"0T00:04:27","StartupUTC":"2022-03-16T20:45:07","Sleep":0,"CfgHolder":4617,"BootCount":2,"BCResetTime":"2022-03-14T09:44:18","SaveCount":2717}}
16:49:34.361 MQT: stat/LINKIND-S01/STATUS2 = {"StatusFWR":{"Version":"11.0.0.4(tasmota)","BuildDateTime":"2022-03-13T18:36:33","Core":"2_0_3","SDK":"v4.4-2.0.2-531-g2509ea4227","CpuFrequency":80,"Hardware":"ESP32-U4WDH","CR":"443/699"}}
16:49:34.453 MQT: stat/LINKIND-S01/STATUS3 = {"StatusLOG":{"SerialLog":2,"WebLog":4,"MqttLog":0,"SysLog":0,"LogHost":"","LogPort":514,"SSId":["ElksInNC-IOT",""],"TelePeriod":300,"Resolution":"558180C0","SetOption":["00008109","2805C80001000600003C5A0A000000000000","00800268","00006008","00004003"]}}
16:49:34.565 MQT: stat/LINKIND-S01/STATUS4 = {"StatusMEM":{"ProgramSize":1613,"Free":1856,"Heap":45,"StackLowMark":3,"PsrMax":0,"PsrFree":0,"ProgramFlashSize":4096,"FlashSize":4096,"FlashFrequency":40,"FlashMode":3,"Features":["00000809","9FDAC7CF","0015A001","B7F7BFCF","05DA9BC0","E0360DC7","400840D2","20200000","0034482D"],"Drivers":"1,2,3,4,5,7,8,9,10,11,12,14,16,17,20,21,24,26,27,29,34,35,38,50,52,56,59,79,82,86","Sensors":"1,2,3,5,6,7,8,9,10,11,12,13,14,15,17,18,19,20,21,22,26,28,31,34,37,39,40,42,43,45,51,52,55,56,58,59,62,64,66,67,74,85,92,95,127"}}
16:49:34.625 BRY: GC from 55529 to 45208 bytes, objects freed 133/783 (in 7 ms) - slots from 821/835 to 546/775
16:49:34.684 MQT: stat/LINKIND-S01/STATUS5 = {"StatusNET":{"Hostname":"LINKIND-S01","IPAddress":"10.10.10.174","Gateway":"10.10.10.1","Subnetmask":"255.255.255.0","DNSServer1":"10.10.10.1","DNSServer2":"0.0.0.0","Mac":"B8:F0:09:70:1F:60","Ethernet":{"Hostname":"","IPAddress":"0.0.0.0","Gateway":"0.0.0.0","Subnetmask":"0.0.0.0","DNSServer1":"0.0.0.0","DNSServer2":"0.0.0.0","Mac":"00:00:00:00:00:00"},"Webserver":2,"HTTP_API":1,"WifiConfig":4,"WifiPower":17.0}}
16:49:34.786 MQT: stat/LINKIND-S01/STATUS6 = {"StatusMQT":{"MqttHost":"192.168.50.25","MqttPort":1883,"MqttClientMask":"DVES_%06X","MqttClient":"DVES_701F60","MqttUser":"mqttuser","MqttCount":1,"MAX_PACKET_SIZE":1200,"KEEPALIVE":30,"SOCKET_TIMEOUT":4}}
16:49:34.871 MQT: stat/LINKIND-S01/STATUS7 = {"StatusTIM":{"UTC":"2022-03-16T20:49:34","Local":"2022-03-16T16:49:34","StartDST":"2022-03-13T02:00:00","EndDST":"2022-11-06T02:00:00","Timezone":99,"Sunrise":"02:01","Sunset":"13:56"}}
16:49:34.970 MQT: stat/LINKIND-S01/STATUS10 = {"StatusSNS":{"Time":"2022-03-16T16:49:34","Switch1":"OFF","ESP32":{"Temperature":170.0},"TempUnit":"F"}}
16:49:35.041 BRY: GC from 51216 to 44267 bytes, objects freed 70/769 (in 6 ms) - slots from 723/790 to 533/714
16:49:35.062 MQT: stat/LINKIND-S01/STATUS11 = {"StatusSTS":{"Time":"2022-03-16T16:49:35","Uptime":"0T00:04:28","UptimeSec":268,"Heap":49,"SleepMode":"Dynamic","Sleep":0,"LoadAvg":999,"MqttCount":1,"Berry":{"HeapUsed":43,"Objects":769},"POWER":"OFF","Wifi":{"AP":1,"SSId":"ElksInNC-IOT","BSSId":"02:EC:DA:81:4D:F5","Channel":8,"Mode":"11n","RSSI":88,"Signal":-56,"LinkCount":1,"Downtime":"0T00:00:06"}}}
16:49:35.174 MQT: stat/LINKIND-S01/STATUS12 = {"StatusSTK":{"Exception":28,"Reason":"LoadProhibited","EPC":"40091b70","EXCVADDR":"00000000","CallChain":["40091b6d","401ef82e","401f3a39","401f15e6","401f1622","400e9f6a","40166d43","40151bff","4015297a","4014819a","400d3741","4012dd2a","4015664d","4015669d","400d9b29","40160e03","401661ae","4016b349","40154b49","401175d7","400d3c71","40156113","4012dcad","4016b7e1","400e5b81"]}}
16:49:36.490 BRY: GC from 54545 to 49116 bytes, objects freed 52/760 (in 8 ms) - slots from 690/729 to 524/714
16:49:36.520 BRY: GC from 59606 to 59606 bytes, objects freed 0/1146 (in 4 ms) - slots from 910/945 to 910/945
16:49:36.979 SRC: MQTT
16:49:36.983 CMD: Grp 0, Cmd 'SWITCHDEBOUNCE', Idx 1, Len 0, Pld -99, Data ''
16:49:37.004 MQT: stat/LINKIND-S01/RESULT = {"SwitchDebounce":50}
16:49:37.071 SRC: MQTT
16:49:37.075 CMD: Grp 0, Cmd 'BLINKTIME', Idx 1, Len 0, Pld -99, Data ''
16:49:37.094 MQT: stat/LINKIND-S01/RESULT = {"BlinkTime":10}
16:49:37.160 SRC: MQTT
16:49:37.164 CMD: Grp 0, Cmd 'MQTTLOG', Idx 1, Len 0, Pld -99, Data ''
16:49:37.185 MQT: stat/LINKIND-S01/RESULT = {"MqttLog":0}
16:49:37.244 SRC: MQTT
16:49:37.260 CMD: Grp 0, Cmd 'PULSETIME', Idx 2, Len 0, Pld -99, Data ''
16:49:37.271 MQT: stat/LINKIND-S01/RESULT = {"PulseTime2":{"Set":0,"Remaining":0}}
16:49:37.347 SRC: MQTT
16:49:37.360 CMD: Grp 0, Cmd 'PULSETIME', Idx 4, Len 0, Pld -99, Data ''
16:49:37.373 MQT: stat/LINKIND-S01/RESULT = {"PulseTime4":{"Set":0,"Remaining":0}}
16:49:37.427 SRC: MQTT
16:49:37.429 CMD: Grp 0, Cmd 'PULSETIME', Idx 6, Len 0, Pld -99, Data ''
16:49:37.438 MQT: stat/LINKIND-S01/RESULT = {"PulseTime6":{"Set":0,"Remaining":0}}
16:49:37.508 SRC: MQTT
16:49:37.510 CMD: Grp 0, Cmd 'PULSETIME', Idx 8, Len 0, Pld -99, Data ''
16:49:37.519 MQT: stat/LINKIND-S01/RESULT = {"PulseTime8":{"Set":0,"Remaining":0}}
16:49:38.936 SRC: MQTT
16:49:38.938 CMD: Grp 0, Cmd 'GPIO', Idx 1, Len 0, Pld -99, Data ''
16:49:38.950 MQT: stat/LINKIND-S01/RESULT = {"GPIO10":{"0":"None"},"GPIO16":{"0":"None"},"GPIO17":{"0":"None"}}
16:49:38.967 BRY: GC from 69861 to 45189 bytes, objects freed 92/777 (in 7 ms) - slots from 1272/1294 to 546/790
16:49:39.036 SRC: MQTT
16:49:39.039 CMD: Grp 0, Cmd 'INTERLOCK', Idx 1, Len 0, Pld -99, Data ''
16:49:39.049 MQT: stat/LINKIND-S01/RESULT = {"Interlock":"OFF"}
16:49:39.127 SRC: MQTT
16:49:39.129 CMD: Grp 0, Cmd 'PULSETIME', Idx 1, Len 0, Pld -99, Data ''
16:49:39.149 MQT: stat/LINKIND-S01/RESULT = {"PulseTime1":{"Set":0,"Remaining":0}}
16:49:39.219 SRC: MQTT
16:49:39.221 CMD: Grp 0, Cmd 'PULSETIME', Idx 5, Len 0, Pld -99, Data ''
16:49:39.237 MQT: stat/LINKIND-S01/RESULT = {"PulseTime5":{"Set":0,"Remaining":0}}
16:49:40.592 SRC: MQTT
16:49:40.594 CMD: Grp 0, Cmd 'BUTTONDEBOUNCE', Idx 1, Len 0, Pld -99, Data ''
16:49:40.601 MQT: stat/LINKIND-S01/RESULT = {"ButtonDebounce":50}
16:49:40.673 SRC: MQTT
16:49:40.676 CMD: Grp 0, Cmd 'PULSETIME', Idx 3, Len 0, Pld -99, Data ''
16:49:40.684 MQT: stat/LINKIND-S01/RESULT = {"PulseTime3":{"Set":0,"Remaining":0}}
16:49:41.992 SRC: MQTT
16:49:41.995 CMD: Grp 0, Cmd 'TEMPLATE', Idx 1, Len 0, Pld -99, Data ''
16:49:42.011 MQT: stat/LINKIND-S01/RESULT = {"NAME":"Linkind Switch ESP32Solo Motion","GPIO":[0,0,0,0,0,224,160,1,0,0,288,0,1,1,0,0,0,0,0,0,0,576,321,0,0,0,0,0,33,32,0,0,0,0,0,0],"FLAG":0,"BASE":1}
16:49:42.103 SRC: MQTT
16:49:42.106 CMD: Grp 0, Cmd 'PULSETIME', Idx 7, Len 0, Pld -99, Data ''
16:49:42.126 MQT: stat/LINKIND-S01/RESULT = {"PulseTime7":{"Set":0,"Remaining":0}}
16:49:42.342 SRC: MQTT
16:49:42.345 CMD: Grp 0, Cmd 'BLINKCOUNT', Idx 1, Len 0, Pld -99, Data ''
16:49:42.353 MQT: stat/LINKIND-S01/RESULT = {"BlinkCount":10}
16:49:44.957 WIF: Checking connection...
jziolkowski commented 2 years ago

Please raise this issue with Tasmota, as this behavior showing a problem with tasmota. The original ESP8266 has a much weaker CPU and it works fine with the amount of queries from TDM on boot.

Jason2866 commented 2 years ago

What you want to do is to much for the ESP32solo1 (single core cpu). Using BLE is already borderline additionaly a Berry script for BLE, well lucky it works! Trying to do another job which generates wifi traffic (polling infos via mqtt) is to much. Keep in mind BLE and Wifi shares the same hardware. So the cpu has to switch and manage this. In general using BLE with ESP32 and do "other" stuff has a high change for instability