rancilio-pid / clevercoffee

Do-It-Yourself PID für Espressomaschinen
https://clevercoffee.de
GNU General Public License v2.0
270 stars 139 forks source link

Random reboots at startup (may be also during normal use) #478

Closed genkigenki closed 4 months ago

genkigenki commented 4 months ago

With latest master (after HAL commit, but maybe also before) Clevercoffee reboots during setup() and loop() (seen shortly after startup).

With a startup-counter i could see that reboots happen specifically at 1) enableTimer1(); 2) Logger::begin(); Logger::setLevel(LOGLEVEL); LOGF(INFO, ... 3) in loop()

Video of reboot during loop() attached: https://github.com/rancilio-pid/clevercoffee/assets/3080081/274d3f60-d9d6-4b0e-b8c8-4704a15baede

No other load on the machine, website not used, but MQTT activated.

fiendie commented 4 months ago

I can't reproduce this behaviour so we need a little more information. Please hook up your machine via USB and look at the output of the serial monitor. If there is a kernel panic or something you should see a backtrace.

genkigenki commented 4 months ago

I can't reproduce this behaviour so we need a little more information. Please hook up your machine via USB and look at the output of the serial monitor. If there is a kernel panic or something you should see a backtrace.

Its unfortunately sporadic. I turned on/off around 20 times until it happened and cut the video to show only the time it happened. Did you powercycle your machine a few times?

USB is difficult, as I cant reach in easily with all wires attached. Would have to setup a second ESP32 with some extra sensors, but currently i dont have the parts

fiendie commented 4 months ago

I don't run bleeding edge code on my production machine as a matter of policy. If you're not willing/able to either build a separate test setup or at least connect a USB lead to look for a backtrace you should probably think about doing that as well.

Still cannot reproduce that behaviour after 15 power cycles on my test setup. And I suspect you're not just running the unmodified master?

If you won't provide any more information I'm not really sure what we're supposed to do.

genkigenki commented 4 months ago

Thanks for powercycling! Is the test setup connecting to WLAN?

I run the unmodified master, as written in the bugfix description. (The version before the commits of today, 10.3.2024)

I activated pretty standard features in userconfig, see extract below.

define OLED_DISPLAY 2 // 0 = deactivated, 1 = SH1106 (e.g. 1.3 "128x64), 2 = SSD1306 (e.g. 0.96" 128x64), 3 = SH1106_126x64_SPI

define CONNECTMODE 1 // 0 = offline 1 = WIFI-MODE

define MAXWIFIRECONNECTS 5 // maximum number of reconnection attempts, use -1 to deactivate

define WIFICONNECTIONDELAY 10000 // delay between reconnects in ms

define ONLYPID 0 // 0 = PID and preinfusion, 1 = Only PID

define ONLYPIDSCALE 0 // 0 = off , 1 = OnlyPID with Scale

define BREWMODE 1 // 1 = Brew by time (with preinfusion); 2 = Brew by weight (from scale)

define FEATURE_BREWDETECTION 1 // 0 = deactivated, 1 = activated

define BREWDETECTION_TYPE 2 // 1 = Software (Onlypid 1), 2 = Hardware (Onlypid 0), 3 = optocoupler for Only PID

define FEATURE_POWERSWITCH 0 // 0 = deactivated, 1 = activated

define FEATURE_BREWSWITCH 1 // 0 = deactivated, 1 = activated

define FEATURE_STEAMSWITCH 1 // 0 = deactivated, 1 = activated

define FEATURE_WATER_SENS 1 // 0 = deactivated, 1 = activated

define OTA true // true = OTA activated, false = OTA deactivated

define FEATURE_MQTT 1 // 0 = deactivated, 1 = activated

define TEMP_SENSOR 2 // Temp sensor type: 1 = DS18B20, 2 = TSIC306

define LOGLEVEL Logger::Level::INFO

genkigenki commented 4 months ago

LoQue was able to log the behaviour on 12.3.2023 (in tcpip.c and mdns.c, from IDF Framework):

CORRUPT HEAP: Bad head at 0x3ffdb2b0. Expected 0xabba1234 got 0x10981234

assert failed: multi_heap_free multi_heap_poisoning.c:259 (head != NULL)

Backtrace: 0x40083ae1:0x3ffb4f20 0x4008d54d:0x3ffb4f40 0x40092cd9:0x3ffb4f60 0x400928d1:0x3ffb5090 0x40084051:0x3ffb50b0 0x40092d09:0x3ffb50d0 0x4008e972:0x3ffb50f0 0x40121543:0x3ffb5110 0x40123e35:0x3ffb5130 0x401101b7:0x3ffb5160

0 0x40083ae1:0x3ffb4f20 in panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:408

1 0x4008d54d:0x3ffb4f40 in esp_system_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:137

2 0x40092cd9:0x3ffb4f60 in __assert_func at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/assert.c:85

3 0x400928d1:0x3ffb5090 in multi_heap_free at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/heap/multi_heap_poisoning.c:259 (discriminator 1)

4 0x40084051:0x3ffb50b0 in heap_caps_free at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/heap/heap_caps.c:382

5 0x40092d09:0x3ffb50d0 in free at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/heap.c:39

6 0x4008e972:0x3ffb50f0 in vQueueDelete at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/freertos/queue.c:2152

7 0x40121543:0x3ffb5110 in sys_mbox_free at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/port/esp32/freertos/sys_arch.c:391 (discriminator 2)

8 0x40123e35:0x3ffb5130 in lwip_netconn_do_listen at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/api/api_msg.c:1574

9 0x401101b7:0x3ffb5160 in tcpip_thread_handle_msg at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/api/tcpip.c:208

  (inlined by) tcpip_thread at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/api/tcpip.c:154

Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

Core 0 register dump: PC : 0x4008ad5c PS : 0x00060630 A0 : 0x8017784b A1 : 0x3ffda980 A2 : 0xcac795c7 A3 : 0xcac795c3 A4 : 0x000000ff A5 : 0x0000ff00 A6 : 0x00ff0000 A7 : 0xff000000 A8 : 0x0000000d A9 : 0x0000005f A10 : 0x00000000 A11 : 0x00000000 A12 : 0x3ff96355 A13 : 0x3ffda960 A14 : 0x00000003 A15 : 0x00060023 SAR : 0x00000008 EXCCAUSE: 0x0000001c EXCVADDR: 0xcac795c7 LBEG : 0x4008a4dc LEND : 0x4008a4e6 LCOUNT : 0x00000000

Backtrace: 0x4008ad59:0x3ffda980 0x40177848:0x3ffda990 0x401778cf:0x3ffdaad0 0x40178369:0x3ffdac10 0x4017a4cd:0x3ffdac50

0 0x4008ad59:0x3ffda980 in strlen at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/machine/xtensa/strlen.S:43

1 0x40177848:0x3ffda990 in _mdns_append_fqdn at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/mdns/mdns.c:551

2 0x401778cf:0x3ffdaad0 in _mdns_append_fqdn at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/mdns/mdns.c:596

3 0x40178369:0x3ffdac10 in _mdns_append_question at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/mdns/mdns.c:961

  (inlined by) _mdns_dispatch_tx_packet at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/mdns/mdns.c:1154

4 0x4017a4cd:0x3ffdac50 in _mdns_tx_handle_packet at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/mdns/mdns.c:4219

  (inlined by) _mdns_execute_action at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/mdns/mdns.c:4511
  (inlined by) _mdns_service_task at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/mdns/mdns.c:4644
LoQue90 commented 4 months ago

this helped on my side: https://github.com/rancilio-pid/clevercoffee/pull/481