ratgdo / homekit-ratgdo

A native HomeKit implementation of a Security+ 2.0 garage door controller based on ratgdo hardware
GNU General Public License v3.0
201 stars 21 forks source link

Crash when HomeKit hub (Apple TV) rebooted #215

Open dkerr64 opened 1 month ago

dkerr64 commented 1 month ago

Both my ratgdo's crashed at the same time after 17 days running. One of them subsequently crashed a second time 3 mins 28 seconds after its first crash, overwriting the crash log from the first crash. But I believe the first (simultaneous) crashes had the same cause.

The easy one is the 2nd crash... it the MDNSResponder crash we're familiar with.

The first crash is harder. What was I doing at the time? Well I just returned from two weeks away and found that the UPS my AppleTV was connected to had entered some sort of error state. As part of recovering from that I unplugged the AppleTV, and ~5 minutes later plugged it back in. As far as I can tell this triggered some activity on both ratgdo's that caused a crash.

Captured log is...

Crash information recovered from EEPROM
Crash # 1 at 1489206120 ms
Restart reason: 2
Exception (3):
epc1=0x4010110d epc2=0x00000000 epc3=0x00000000 excvaddr=0x4000b2e1 depc=0x00000000

ctx: cont

sp: 3fff1c40 end: 3fff2090
3fff1c40: 00000000 00000000 00000000 00000000 
3fff1c50: 00000000 00000218 00000020 40101394 
3fff1c60: 00000000 00000000 0000005a 4024797d 
3fff1c70: 00000000 0000014c 00000000 40247f17 
3fff1c80: 00000000 00000000 00000000 4c000000 
3fff1c90: 00000001 bc000000 0478d957 00000000 
3fff1ca0: 00000000 00000000 3fff748c 4024a973 
3fff1cb0: 00000000 0000015e 3fff6c44 00000000 
3fff1cc0: 00000000 3fff6c46 0000015e 00000000 
3fff1cd0: 00000218 00000000 0000015e 37862afd 
3fff1ce0: 7d5066a3 045c7e75 bb63e1ea 00000002 
3fff1cf0: 00000000 00000000 3fff68e4 40228541 
3fff1d00: 0000015e 00000000 000000cc 00000000 
3fff1d10: 00000000 00000000 00000000 3fff6c44 
3fff1d20: 0000015e 3fff7b24 3fff7724 40229715 
3fff1d30: 3fff6c44 00000002 00000000 3fff1e40 
3fff1d40: 3fff7724 00000000 00000020 3fff6c44 
3fff1d50: 3fff7724 3fff7b8d 3fff7b24 40229864 
3fff1d60: 3fff6c46 3fff6d92 00000005 40102864 
3fff1d70: 00000000 000000cc 00000000 0000015c 
3fff1d80: 4010014c 3ffeec90 3fff1db0 3fff1da0 
3fff1d90: 0000014c 3fff7174 0000014c 0000014c 
3fff1da0: 40277e19 00000000 00000000 3fff1e40 
3fff1db0: 3fff7174 0000016b 3fff7724 402299e0 
3fff1dc0: 00000000 0000016b 3fff6b5c 40229cb6 
3fff1dd0: 50545448 312e312f 30303220 0d4b4f20 
3fff1de0: 6e6f430a 746e6574 7079542d 61203a65 
3fff1df0: 696c7070 69746163 702f6e6f 69726961 
3fff1e00: 742b676e 0d38766c 6e6f430a 746e6574 
3fff1e10: 6e654c2d 3a687467 0d642520 6e6f430a 
3fff1e20: 7463656e 3a6e6f69 65656b20 6c612d70 
3fff1e30: 0d657669 000a0d0a 3fff6b5c 40229c22 
3fff1e40: 000000e4 ffffffff ffffffff ffffffff 
3fff1e50: 3fff7724 3fff1e40 000000e4 00000068 
3fff1e60: 3fff1e01 23bc6e6f 3fff7724 3fff6954 
3fff1e70: 3fff4abc 3fff1f74 3fff7724 4022bc64 
3fff1e80: 00000002 42463832 42453641 3945342d 
3fff1e90: 39342d41 392d3243 2d343335 36313934 
3fff1ea0: 42333133 43384333 26569500 d9127420 
3fff1eb0: 02d167b9 7ce40d86 6195152c b43aeb9c 
3fff1ec0: 86120c3f 642987e3 00000049 00000000 
3fff1ed0: 00000000 00000000 00000000 00000000 
3fff1ee0: 00000000 00000000 00000000 00000000 
3fff1ef0: 00000000 00000000 00000000 00000000 
3fff1f00: 00000000 00000000 00000000 00000001 
3fff1f10: 20265695 b9d91274 8602d167 2c7ce40d 
3fff1f20: 9c619515 3fb43aeb e386120c 49642987 
3fff1f30: 6a8a3f7b b4c7a3d9 4823fba1 6a469934 
3fff1f40: 6562d73e 92fb9cca cfa67c1b cb69ac0b 
3fff1f50: 00000004 00000000 0000005b 00000000 
3fff1f60: 00000000 00000000 00000000 00000000 
3fff1f70: 00000020 00000010 00000000 00000000 
3fff1f80: ea037cb1 6c531d3a 00000000 00000000 
3fff1f90: 3fff75d2 00000006 00000020 00000000 
3fff1fa0: 3fff75d7 3fff7b24 3fff7724 4022d5a4 
3fff1fb0: 3fff75d7 0000008c 3fff7b5c 40204c32 
3fff1fc0: 3fff754c 3fff0e8c 00000000 00000000 
3fff1fd0: 00000000 00000000 00000000 00000000 
3fff1fe0: 3fff75d8 00000001 3fff754c 3fff7bad 
3fff1ff0: 00000021 00000030 3fff75b7 00000000 
3fff2000: 3fffdad0 0000009e 00000020 3fff0ec4 
3fff2010: 3fff754c 3fff754c 3fff754c 
Incomplete stack trace saved!
No more EEPROM space available to save crash information!

Flash CRC OK
Firmware Version: 1.6.0

t: [Client 1073695548] Disconnected!
>>> [1489166291] HomeKit: [Client 1073695548] Closing client connection
>>> [1489169454] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489173459] HomeKit: [Client 1073706468] Update Characteristics
>>> [1489173499] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489173522] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489173540] HomeKit: [Client 1073706468] Update Characteristics
>>> [1489173550] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489173574] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489173602] HomeKit: [Client 1073706468] Update Characteristics
>>> [1489173620] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489173638] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489173826] HomeKit: [Client 1073706468] Update Characteristics
>>> [1489173859] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489173884] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489175161] HomeKit: [Client 1073706468] Update Characteristics
>>> [1489175172] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489175187] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489176165] HomeKit: [Client 1073706468] Update Characteristics
>>> [1489176178] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489176197] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489180558] HomeKit: [Client 1073706468] Update Characteristics
>>> [1489180571] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489180591] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489181629] HomeKit: [Client 1073706468] Update Characteristics
>>> [1489181639] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489181658] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489181872] HomeKit: [Client 1073706468] Update Characteristics
>>> [1489181888] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489181903] HomeKit: [Client 1073706468] Get Characteristics
>>> [1489205775] HomeKit: [Client 1073706468] List Pairings
jgstroud commented 1 month ago

I've definitely seen mine crash when my HomePod reboots. one time when I was determined to fix it, right after it happened, I went and started a packet capture and then rebooted the home pod over and over and failed to reproduce. It will only happen when you aren't looking for it. Something like a watched pot never boils.... "A watched ratgdo never crashes"

colinrblake commented 1 month ago

Heisenberg's Uncertainty Principle :-)

dkerr64 commented 1 month ago

Adding stack decode...

Exception Cause: 3  [LoadStoreError: Processor internal physical address or data error during load or store]

0x4010110d: umm_malloc_core at umm_malloc.cpp:?
0x4000b2e1: ?? ??:0
0x40101394: malloc at ??:?
0x4024797d: netif_do_set_ipaddr at /Users/jstroud/git/Arduino/tools/sdk/lwip2/builder/lwip2-src/src/core/netif.c:475
0x40247f17: pbuf_copy_partial_pbuf at /Users/jstroud/git/Arduino/tools/sdk/lwip2/builder/lwip2-src/src/core/pbuf.c:1024
0x4024a973: tcp_write at /Users/jstroud/git/Arduino/tools/sdk/lwip2/builder/lwip2-src/src/core/tcp_out.c:718
0x40228541: ClientContext::_consume(unsigned int) at ??:?
0x40229715: client_send_encrypted_(_client_context_t*, unsigned char*, unsigned int) at ??:?
0x40229864: client_decrypt_(_client_context_t*, unsigned char*, unsigned int, unsigned char*, unsigned int*) at ??:?
0x40102864: pp_post at ??:?
0x4010014c: std::function<void (void const*)>::operator()(void const*) const at ??:?
0x40277e19: system_get_sdk_version at ??:?
0x402299e0: client_send_P(_client_context_t*, char const*) at ??:?
0x40229cb6: send_json_response(_client_context_t*, int, unsigned char*, unsigned int) at ??:?
0x40229c22: send_tlv_error_response(_client_context_t*, int, TLVError) at ??:?
0x4022bc64: homekit_server_close_client(homekit_server_t*, _client_context_t*) at ??:?
0x4022d5a4: arduino_homekit_setup at ??:?
0x40204c32: http_parser_execute at ??:?
dkerr64 commented 1 month ago

Adding in crash reported by @donavanbecker ...

Crash information recovered from EEPROM
Crash # 1 at 540949935 ms
Restart reason: 2
Exception (3):
epc1=0x4010110d epc2=0x00000000 epc3=0x00000000 excvaddr=0x400180e9 depc=0x00000000

ctx: cont

sp: 3fff1d20 end: 3fff2080
3fff1d20: 3fff77e4 0000016a 00000100 3fff1e30 
3fff1d30: 3fff6d8c 00000000 00000020 40101394 
3fff1d40: 3fff3ed4 000000ff 3fff718c 402296b4 
3fff1d50: 401033ef 3ffeec80 00000005 40102864 
3fff1d60: 00000000 00000000 00000000 401035cc 
3fff1d70: 401033ef 3ffeebe0 3fff1da0 3fff1d90 
3fff1d80: 0000014c 3fff77e4 3ffe8f10 40101012 
3fff1d90: 40277d31 00000000 00000000 3fff1e30 
3fff1da0: 3fff77e4 0000016b 3fff6d8c 402298f8 
3fff1db0: 00000000 0000016b 3fff76fc 40229bce 
3fff1dc0: 50545448 312e312f 30303220 0d4b4f20 
3fff1dd0: 6e6f430a 746e6574 7079542d 61203a65 
3fff1de0: 696c7070 69746163 702f6e6f 69726961 
3fff1df0: 742b676e 0d38766c 6e6f430a 746e6574 
3fff1e00: 6e654c2d 3a687467 0d642520 6e6f430a 
3fff1e10: 7463656e 3a6e6f69 65656b20 6c612d70 
3fff1e20: 0d657669 000a0d0a 3fff76fc 40229b3a 
3fff1e30: 000000e4 ffffffff ffffffff ffffffff 
3fff1e40: 3fff6d8c 3fff1e30 000000e4 00000068 
3fff1e50: 3fff1e01 f43e8d6a 3fff6d8c 3fff7684 
3fff1e60: 3fff5bf4 3fff1f64 3fff6d8c 4022bb7c 
3fff1e70: 00000002 38334435 36324333 4632362d 
3fff1e80: 34342d46 422d3338 2d453944 36423844 
3fff1e90: 44363234 32364242 31f60300 bc7f12f2 
3fff1ea0: 29a1594e 54967124 66fdbcbe f54ab707 
3fff1eb0: 55406a4c afdc65c7 00000053 00000000 
3fff1ec0: 00000000 00000000 00000000 00000000 
3fff1ed0: 00000000 00000000 00000000 00000000 
3fff1ee0: 00000000 00000000 00000000 00000000 
3fff1ef0: 00000000 00000000 00000000 00000001 
3fff1f00: f231f603 4ebc7f12 2429a159 be549671 
3fff1f10: 0766fdbc 4cf54ab7 c755406a 53afdc65 
3fff1f20: 0b1b88a1 adb94d0a 74d5cc83 f705c40f 
3fff1f30: 8b90b790 be85524d 80541051 1a30788b 
3fff1f40: 00000004 00000000 00000219 00000000 
3fff1f50: 00000000 00000000 00000000 00000000 
3fff1f60: 00000020 00000010 00000000 00000000 
3fff1f70: 01f5d706 c8ee1a8c 00000000 00000000 
3fff1f80: 3fff693a 00000006 00000020 00000000 
3fff1f90: 3fff693f 3fff718c 3fff6d8c 4022d4bc 
3fff1fa0: 3fff693f 0000008c 3fff71c4 40204c32 
3fff1fb0: 3fff68b4 3fff0e74 00000000 00000000 
3fff1fc0: 00000000 00000000 00000000 00000000 
3fff1fd0: 3fff6940 00000001 3fff68b4 3fff7215 
3fff1fe0: 00000021 00000030 3fff691f 00000000 
3fff1ff0: 3fffdad0 0000009e 00000020 3fff0eac 
3fff2000: 3fff68b4 3fff68b4 3fff6d8c 4022be35 
3fff2010: 0000008c 00000000 3fff5410 4021d254 
3fff2020: 0000008c 3fff0c58 3fff0c1c 3fff20dc 
3fff2030: 3fffdad0 3fff2930 3fff20b0 3fff20dc 
3fff2040: 3fffdad0 3fff441c 3fff718c 4022c199 
3fff2050: 3fffdad0 00000000 3fff20b0 4021eadb 
3fff2060: 00000000 00000000 00000001 40234168 
3fff2070: feefeffe feefeffe 3fffdab0 401007ad 

EEPROM space available: 0x007b bytes

Flash CRC OK
Firmware Version: 1.6.0

TGDO: get target door state: 0
>>> [540590377] RATGDO: get light state: On
>>> [540628843] HomeKit: [Client 1073703340] Get Characteristics
>>> [540689374] HomeKit: [Client 1073703340] Get Characteristics
>>> [540749665] HomeKit: [Client 1073703340] Get Characteristics
>>> [540809787] HomeKit: [Client 1073703340] Get Characteristics
>>> [540839818] HomeKit: [Client 1073703340] Get Characteristics
>>> [540839842] RATGDO: get light state: On
>>> [540849669] HomeKit: [Client 1073703340] Get Characteristics
>>> [540849788] RATGDO: get current door state: 0
>>> [540850990] HomeKit: [Client 1073703340] Get Characteristics
>>> [540851000] RATGDO: get light state: On
>>> [540854537] HomeKit: [Client 1073703340] Get Characteristics
>>> [540854548] RATGDO: get current door state: 0
>>> [540870670] HomeKit: [Client 1073703340] Get Characteristics
>>> [540871933] RATGDO: reader completed packet
>>> [540871934] RATGDO: DECODED  0002388B 000000306E511006 42608181
>>> [540871935] RATGDO: PACKET(0x511006 @ 0x2388B) Status - Status: [DoorState Open, Parity 0x8, Obs 1, Lock 0, Light 1]
>>> [540871944] RATGDO: tgt 0 curr 0
>>> [540919461] HomeKit: [Client 1073703340] Get Characteristics
>>> [540919470] RATGDO: get light state: On
>>> [540929431] HomeKit: [Client 1073703340] Get Characteristics
>>> [540929442] RATGDO: get current door state: 0
>>> [540930586] HomeKit: [Client 1073703340] Get Characteristics
>>> [540930603] RATGDO: get light state: On
>>> [540930710] HomeKit: [Client 1073703340] Get Characteristics
>>> [540935424] HomeKit: [Client 1073703340] Get Characteristics
>>> [540935443] RATGDO: get current door state: 0
>>> [540936594] HomeKit: [Client 1073703340] Get Characteristics
>>> [540936611] RATGDO: get light state: On
>>> [540946437] HomeKit: [Client 1073703340] Get Characteristics
>>> [540946454] RATGDO: get current door state: 0
>>> [540947869] HomeKit: [Client 1073703340] Get Characteristics
>>> [540947925] RATGDO: get light state: On
>>> [540949657] HomeKit: [Client 1073703340] List Pairings

which decodes to...

Exception Cause: 3  [LoadStoreError: Processor internal physical address or data error during load or store]

0x4010110d: umm_malloc_core at umm_malloc.cpp:?
0x400180e9: ?? ??:0
0x40101394: malloc at ??:?
0x402296b4: client_send_encrypted_(_client_context_t*, unsigned char*, unsigned int) at ??:?
0x401033ef: rcReachRetryLimit at ??:?
0x40102864: pp_post at ??:?
0x401035cc: rcReachRetryLimit at ??:?
0x401033ef: rcReachRetryLimit at ??:?
0x40101012: umm_free_core at umm_malloc.cpp:?
0x40277d31: system_get_sdk_version at ??:?
0x402298f8: client_send(_client_context_t*, unsigned char*, unsigned int) at ??:?
0x40229bce: send_tlv_response(_client_context_t*, tlv_values_t*) at ??:?
0x40229b3a: send_tlv_response(_client_context_t*, tlv_values_t*) at ??:?
0x4022bb7c: homekit_server_on_pairings(_client_context_t*, unsigned char const*, unsigned int) at ??:?
0x4022d4bc: homekit_server_on_message_complete(http_parser*) at ??:?
0x40204c32: http_parser_execute at ??:?
0x4022be35: homekit_client_process(_client_context_t*) at ??:?
0x4021d254: comms_loop() at ??:?
0x4022c199: homekit_server_process(homekit_server_t*) at ??:?
0x4021eadb: loop at ??:?
0x40234168: loop_wrapper() at core_esp8266_main.cpp:?
0x401007ad: cont_wrapper at ??:?
jgstroud commented 1 month ago

we had stormy weather today and I got a bunch of notifications about my HomePod coming and going offline and on again. ultimately a bunch of devices were no response including the ratgdo. had to power cycle the HomePod and everything came back. During that time the HomePod was coming and going, but ratgdo did crash, and its a very similar crash to some of the ones reported in discord

Crashdump: https://gist.github.com/jgstroud/081f5e0ae711776cd4e8e1ffc565a375