uriyacovy / ESPHome_nuki_lock

ESPHome lock platform for Nuki Smartlock
MIT License
60 stars 19 forks source link

Fix BLE event handler. #30

Closed canatella closed 1 year ago

canatella commented 1 year ago

I managed to trace what I think is issue #26 :

[17:44:05]Received AuthorizationId: 06 00 35 22 
[17:44:05]Rec encrypted data: 19 ab 2c b9 47 ff e7 bb 66 2e f5 27 ba 34 cd d6 0e 39 76 3c 24 73 56 17 77 fe a4 12 6e ba 5c 18 b8 db 99 4f cf 4f 0a eb f7 1d 65 3f 02 b6 fe 58 72 56 
[17:44:05]Decrypted data: 06 00 35 22 0c 00 02 03 00 e7 07 06 1a 0f 2c 00 00 00 b8 08 00 03 00 00 00 00 00 00 00 00 f0 00 8b 37 
[17:44:05]keyturnerStates: 02 03 00 e7 07 06 1a 0f 2c 00 00 00 b8 08 00 03 00 00 00 00 00 00 00 00 f0 00 
[17:44:05]Received data: 96 82 bb 46 2a 77 2f e6 d6 78 70 59 96 a1 32 94 7c 1b 62 f6 ff 88 01 28 06 00 35 22 19 00 b0 9a 33 22 de dc 3f 90 eb b6 97 34 f6 59 bd 0e d4 b5 0b 96 87 d1 a0 ad b1 [17:44:05]received nonce: 96 82 bb 46 2a 77 2f e6 d6 78 70 59 96 a1 32 94 7c 1b 62 f6 ff 88 01 28 
[17:44:05]Received AuthorizationId: 06 00 35 22 
[17:44:05]Rec encrypted data: b0 9a 33 22 de dc 3f 90 eb b6 97 34 f6 59 bd 0e d4 b5 0b 96 87 d1 a0 ad b1 
[17:44:05]Decrypted data: 06 00 35 22 0e 00 00 75 be 
[17:44:05]status: 00 
[17:44:15]Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
[17:44:15]
[17:44:15]Core  0 register dump:
[17:44:15]PC      : 0x400e6837  PS      : 0x00060f30  A0      : 0x800e68ef  A1      : 0x3ffd2470  
WARNING Decoded 0x400e6837: Nuki::NukiBle::onResult(NimBLEAdvertisedDevice*) at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NukiBleEsp/src/NukiBle.cpp:244
 (inlined by) Nuki::NukiBle::onResult(NimBLEAdvertisedDevice*) at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NukiBleEsp/src/NukiBle.cpp:199
[17:44:15]A2      : 0x3ffbc8b0  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x0000004c  
[17:44:15]A6      : 0x00000004  A7      : 0x3ffdbd3c  A8      : 0x800e682f  A9      : 0x3ffd2450  
[17:44:15]A10     : 0x000000ff  A11     : 0x3ffd248c  A12     : 0x00000019  A13     : 0x00000000  
[17:44:15]A14     : 0x0000002d  A15     : 0x3ffb6c68  SAR     : 0x0000000a  EXCCAUSE: 0x0000001c  
[17:44:15]EXCVADDR: 0x000000ff  LBEG    : 0x400845fd  LEND    : 0x40084605  LCOUNT  : 0x00000027  
[17:44:15]
[17:44:15]
[17:44:15]Backtrace:0x400e6834:0x3ffd24700x400e68ec:0x3ffd2560 0x4019d2ed:0x3ffd2580 0x400e62db:0x3ffd25a0 0x40177fdd:0x3ffd25c0 0x4017b0da:0x3ffd2600 0x401800c9:0x3ffd2670 0x4017ff75:0x3ffd26b0 0x40180391:0x3ffd26d0 0x4017ebf9:0x3ffd26f0 0x4008fb8e:0x3ffd2710 0x40176b9a:0x3ffd2730 
WARNING Found stack trace! Trying to decode it
WARNING Decoded 0x400e6834: Nuki::NukiBle::onResult(NimBLEAdvertisedDevice*) at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NukiBleEsp/src/NukiBle.cpp:243
 (inlined by) Nuki::NukiBle::onResult(NimBLEAdvertisedDevice*) at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NukiBleEsp/src/NukiBle.cpp:199
WARNING Decoded 0x400e68ec: non-virtual thunk to Nuki::NukiBle::onResult(NimBLEAdvertisedDevice*)
WARNING Decoded 0x4019d2ed: BleScanner::Scanner::onResult(NimBLEAdvertisedDevice*) at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/BleScanner/src/BleScanner.cpp:85 (discriminator 2)
WARNING Decoded 0x400e62db: non-virtual thunk to BleScanner::Scanner::onResult(NimBLEAdvertisedDevice*)
WARNING Decoded 0x40177fdd: NimBLEScan::handleGapEvent(ble_gap_event*, void*) at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NimBLE-Arduino/src/NimBLEScan.cpp:153
WARNING Decoded 0x4017b0da: ble_gap_disc_report at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NimBLE-Arduino/src/nimble/nimble/host/src/ble_gap.c:1051
 (inlined by) ble_gap_rx_adv_report at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NimBLE-Arduino/src/nimble/nimble/host/src/ble_gap.c:1579
WARNING Decoded 0x401800c9: ble_hs_hci_evt_le_adv_rpt at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NimBLE-Arduino/src/nimble/nimble/host/src/ble_hs_hci_evt.c:517 (discriminator 3)
WARNING Decoded 0x4017ff75: ble_hs_hci_evt_le_meta at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NimBLE-Arduino/src/nimble/nimble/host/src/ble_hs_hci_evt.c:316
WARNING Decoded 0x40180391: ble_hs_hci_evt_process at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NimBLE-Arduino/src/nimble/nimble/host/src/ble_hs_hci_evt.c:879
WARNING Decoded 0x4017ebf9: ble_hs_event_rx_hci_ev at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NimBLE-Arduino/src/nimble/nimble/host/src/ble_hs.c:548
WARNING Decoded 0x4008fb8e: ble_npl_event_run at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NimBLE-Arduino/src/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:526
 (inlined by) nimble_port_run at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NimBLE-Arduino/src/nimble/porting/nimble/src/nimble_port.c:269
WARNING Decoded 0x40176b9a: NimBLEDevice::host_task(void*) at /home/dam/dm/esphome-configs/.esphome/build/iot-door-lock/.piolibdeps/iot-door-lock/NimBLE-Arduino/src/NimBLEDevice.cpp:836

Looking at the code, I thought this was probably due to one of the pointers (the handler, or the status_update_ pointer) getting invalid at some point and the crash leaving the bluetooth stack in an inoperable state.

This code replaces the lock instance and the handler with member variables and configures event handling in the constructor so that no pointers are needed anymore (and so none can get invalid).

This seems to fix issue #26 for me but I still haven't used it a lot. But the crash that I dumped here is fixed for sure.

uriyacovy commented 1 year ago

@canatella, thanks for the work. Can you explain why do you think that the handler/status_update pointer becomes invalid?

canatella commented 1 year ago

Honestly, I just got a clue by the fact that the stack trace is pointing at https://github.com/uriyacovy/NukiBleEsp32/blob/1de629002a03e6fffc9acd8b746baf498b50f77b/src/NukiBle.cpp#L244 where there is nothing going on but two pointer access:

Now looking at the doc: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/fatal-errors.html#loadprohibited-storeprohibited: LoadProhibited, StoreProhibited

These CPU exceptions happen when an application attempts to read from or write to an invalid memory location. The address which has been written/read is found in the EXCVADDR register in the register dump. If this address is zero, it usually means that the application has attempted to dereference a NULL pointer. If this address is close to zero, it usually means that the application has attempted to access a member of a structure, but the pointer to the structure is NULL. If this address is something else (garbage value, not in 0x3fxxxxxx - 0x6xxxxxxx range), it likely means that the pointer used to access the data is either not initialized or has been corrupted.

According to the stack trace: EXCVADDR: 0x000000ff so it looks like we are dereferencing a pointer in a NULL struct, the offset is 255 bytes which is quite large and also looks like a strange value to me, but that doesn't allow differentiating between the two dereferences without looking at the struct generated by the compiler (which is possible but maybe not worth it here).

uriyacovy commented 1 year ago

Thanks for the analysis. Yet, it does not point to which pointer was invalid. If this issue reproduces easily (in my setup it does not occur), we can try to isolate and maybe understand better. I think that the first approach would be to define the just the handler as a member instead of a pointer, and see if it solves the issue.

canatella commented 1 year ago

Is there any specific reason why you want to use pointers there? It's much more safe to use RAII and entirely avoid this issue and possibly others. If there is undefined behavior here for some reason, there is no way to say what else could go wrong. Also, by avoiding using new, you also prevent memory allocation that could fail and the compiler will leave room in the image for the NukiLock struct so that it wouldn't flash if there wasn't enough memory for it.

uriyacovy commented 1 year ago

The idea is to be able to build NukiLock in runtime, with relevant configuration (device name and ID) and maybe support multiple locks. This is currently not implemented, but this is the reason why I suggested to start with the handler and see if it solves the issue.

canatella commented 1 year ago

I don't think you would need pointers for that, you could just use a vector:

std::vector<NukiLockComponent> locks;

And with the code in the PR it would still create the NukiLock and register the handler.

uriyacovy commented 1 year ago

@canatella, I've merged your PR to dev branch. Thanks for the contribution!

canatella commented 1 year ago

No prob, thanks for your work :)