maximkulkin / esp-homekit

Apple HomeKit accessory server library for ESP-OPEN-RTOS
MIT License
1.11k stars 170 forks source link

esp-homekit crashes when a lot of clients are starting a session after setting up the device #17

Closed renssies closed 6 years ago

renssies commented 6 years ago

I was testing the contact sensor I made at Awkward. Here we have 21+ devices that have HomeKit setup. 8 Devices share the same iCloud account and the other 13 are HomeKit guests.

Setting up the device works fine, but after the setup all the other devices want to start a session too. This causes esp-homekit to eventually crash.

After the crash the device restarts and the sessions are eventually restored, so it's not a high priority, but it is something that happens.

Crash log:

>>> homekit_client_process: [Client 7] Finished processing
>>> homekit_client_process: [Client 8] Got 257 incomming data
>>> homekit_server_on_pair_verify: HomeKit Pair Verify
>>> homekit_server_on_pair_verify: Free heap: 6704
>>> tlv_debug: Got following TLV values:
>>> tlv_debug: Type 6 value (1 bytes): \x03
>>> tlv_debug: Type 5 value (120 bytes): \x11s\x98\x87z\xED\xD4zT\xD4-\\H\xED\xDA1Z\xE1o\xCA\xBD\xFB-E\x9F.\xFE\xFC\xC0w\xBF\xA6\xF1\xD8\xB6S\xC6+\xA1s\xE5\xE3'\x052S1\xF1U\xB4B\xE4\x05\x17v>&#\xDB\x16R\xAD\x16\x89\x1Aa-\xF2\x862\xB8(C{\x9A6\x18\xCB|]+0\x93\x1A\xCAX}>\xF5L\xE2\xBC\xCA\x9C\x8A\xBA\xE9P_J.S\xD6\xF5<,L\x08\xF9\xE5X^\xF4#\xDA\xCC\xAFPXF
>>> HomeKit: [Client 8] Pair Verify Step 2/2
>>> homekit_server_on_pair_verify: [Client 8] Decrypting payload
>>> homekit_server_on_pair_verify: [Client 8] Searching pairing with 39AB5966-3CCD-494E-9CB6-FB41DA29BC9B
>>> HomeKit: [Client 8] Found pairing with 39AB5966-3CCD-494E-9CB6-FB41DA29BC9B
>>> homekit_server_on_pair_verify: [Client 8] Verifying device signature
>>> send_tlv_response: [Client 8] Sending TLV response
>>> tlv_debug: Got following TLV values:
>>> tlv_debug: Type 6 value (1 bytes): \x04
>>> client_send: [Client 8] Sending payload: HTTP/1.1 200 OK\x0D\x0AContent-Type: application/pairing+tlv8\x0D\x0AContent-Length: 3\x0D\x0AConnection: keep-alive\x0D\x0A\x0D\x0A\x06\x01\x04
>>> HomeKit: [Client 8] Verification successful, secure session established
>>> homekit_client_process: [Client 8] Finished processing
>>> HomeKit: Got new client connection: 11
>>> homekit_client_process: [Client 10] Got 168 incomming data
>>> homekit_server_on_pair_verify: HomeKit Pair Verify
>>> homekit_server_on_pair_verify: Free heap: 3996
>>> tlv_debug: Got following TLV values:
>>> tlv_debug: Type 3 value (32 bytes): \x11\x0F\x18\xE7\x19*\xA1\x89:x\x8EO\xC3y\x0C\xE2\x16\x1B4oS\xCBRY/\x8B\x80&\xFD}\xF7q
>>> tlv_debug: Type 6 value (1 bytes): \x01
>>> HomeKit: [Client 10] Pair Verify Step 1/2
>>> homekit_server_on_pair_verify: [Client 10] Importing device Curve25519 public key
>>> homekit_server_on_pair_verify: [Client 10] Generating accessory Curve25519 key
>>> homekit_server_on_pair_verify: [Client 10] Exporting accessory Curve25519 public key
>>> homekit_server_on_pair_verify: [Client 10] Generating Curve25519 shared secret
>>> homekit_server_on_pair_verify: [Client 10] Generating signature
>>> homekit_server_on_pair_verify: [Client 10] Generating proof
>>> homekit_server_on_pair_verify: [Client 10] Encrypting response
>>> send_tlv_response: [Client 10] Sending TLV response
>>> tlv_debug: Got following TLV values:
>>> tlv_debug: Type 5 value (101 bytes): \xC92\xFB\xBAd\x9ABw\xD4\x94\xFA\x1A\xFF\xF3\x9F\x84\xA7\xE2_\x8E,\x1A?\xC1:\x99\x0A{\x82O\xF7v\xD5i\x8D\x96\x08\x00\xC8.\xF8\x96\x9B+8\xAB\x91g\x18\xA7=\\\xA5#).\xC1\xD5\x02\x07\xAB\x81*\xDA\x090DZtX\x98\x86E\xE6\xED\xAB\x1C\xC6\xC75S#\xF3O\xC8*\x08\xBB\xA0Uq:Rm\x07!%\xFE^\xEA\xA1
>>> tlv_debug: Type 3 value (32 bytes): PjIh\x1F\x16\xB1\xDAX\xB0\x96u$\x84T*h\x87\x1Bb\x87f\xDAC\x98\xE7\x0Dj\xE9\x14S\x14
>>> tlv_debug: Type 6 value (1 bytes): \x02
>>> client_send: [Client 10] Sending payload: HTTP/1.1 200 OK\x0D\x0AContent-Type: application/pairing+tlv8\x0D\x0AContent-Length: 140\x0D\x0AConnection: keep-alive\x0D\x0A\x0D\x0A\x05e\xC92\xFB\xBAd\x9ABw\xD4\x94\xFA\x1A\xFF\xF3\x9F\x84\xA7\xE2_\x8E,\x1A?\xC1:\x99\x0A{\x82O\xF7v\xD5i\x8D\x96\x08\x00\xC8.\xF8\x96\x9B+8\xAB\x91g\x18\xA7=\\\xA5#).\xC1\xD5\x02\x07\xAB\x81*\xDA\x090DZtX\x98\x86E\xE6\xED\xAB\x1C\xC6\xC75S#\xF3O\xC8*\x08\xBB\xA0Uq:Rm\x07!%\xFE^\xEA\xA1\x03 PjIh\x1F\x16\xB1\xDAX\xB0\x96u$\x84T*h\x87\x1Bb\x87f\xDAC\x98\xE7\x0Dj\xE9\x14S\x14\x06\x01\x02
>>> homekit_client_process: [Client 10] Finished processing
>>> HomeKit: Got new client connection: 12
>>> homekit_client_process: [Client 8] Got 88 incomming data
>>> homekit_client_process: [Client 8] Decrypting data
>>> homekit_client_process: [Client 8] Decrypted 70 bytes, available 0
Decrypted data (70 bytes): "GET /accessories HTTP/1.1\x0D\x0AHost: Contact\\032Sensor._hap._tcp.local\x0D\x0A\x0D\x0A"
>>> HomeKit: [Client 8] Get Accessories
>>> homekit_server_on_get_accessories: Free heap: 3384
>>> client_send: [Client 8] Sending payload: HTTP/1.1 200 OK\x0D\x0AContent-Type: application/hap+json\x0D\x0ATransfer-Encoding: chunked\x0D\x0AConnection: keep-alive\x0D\x0A\x0D\x0A
Fatal exception (29):
epc1=0x4000e143
epc2=0x00000000
epc3=0x4025fec9

0x4025fec9: sdk_pp_attach at ??:?

excvaddr=0x00000000
depc=0x00000000
excsave1=0x400018cc
Registers:
a0 400018cc a1 3fffb4e0 a2  00000000 a3  00000001
a4  00000001 a5  00000000 a6  4023eee5 a7  4023eee5

0x4023eee5: json_flush at ??:?

0x4023eee5: json_flush at ??:?

a8  4023eee4 a9  00000000 a10 4023eee5 a11 0000007b

0x4023eee4: json_flush at ??:?

0x4023eee5: json_flush at ??:?

a12 3fffb5c0 a13 00000001 SAR 00000010

Stack: SP=0x3fffb4e0
0x3fffb4e0: 40105e36 60000000 00000074 3fff53ce

0x40105e36: __ssputs_r at /home/src/esp/newlib-cygwin/build/xtensa-lx106-elf/newlib/libc/stdio/../../../../../newlib/libc/stdio/nano-vfprintf.c:233

0x3fffb4f0: 3fff5734 4023eee4 3fff53b8 0000001e

0x4023eee4: json_flush at ??:?

0x3fffb500: 00000020 3fff5734 3fff55a4 4023eee4

0x4023eee4: json_flush at ??:?

0x3fffb510: 00000000 3fff5734 3fffb5c0 4010601c

0x4010601c: _svfprintf_r at /home/src/esp/newlib-cygwin/build/xtensa-lx106-elf/newlib/libc/stdio/../../../../../newlib/libc/stdio/nano-vfprintf.c:531

0x3fffb520: 3fffc1c9 3fffc193 00000000 00000037
0x3fffb530: 3fff53b8 00000000 3f3020a4 40214531

0x40214531: __sflush_r at /home/src/esp/newlib-cygwin/build/xtensa-lx106-elf/newlib/libc/stdio/../../../../../newlib/libc/stdio/fflush.c:226

0x3fffb540: 0000000a 00000000 3fff5490 4020494e

0x4020494e: prvCopyDataToQueue at /Volumes/case-sensitive/esp-open-rtos/FreeRTOS/Source/queue.c:2332

0x3fffb550: 00000000 00000000 3fff5490 40204c24

0x40204c24: xQueueGenericSend at /Volumes/case-sensitive/esp-open-rtos/FreeRTOS/Source/queue.c:2332

Free Heap: 3196
_heap_start 0x3fff1ae0 brk 0x3ffffe78 supervisor sp 0x40000000 sp-brk 392 bytes
arena (total_size) 58264 fordblks (free_size) 2804 uordblocks (used_size) 55460

ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x40100000, len 2292, room 16

0x40100000: _stext at ??:?

tail 4
chksum 0x57
load 0x3ffe8000, len 772, room 4
tail 0
chksum 0x0b
csum 0x0b

rBoot v1.4.0 - richardaburton@gmail.com
Flash Size:   8 Mbit
Flash Mode:   DOUT
Flash Speed:  40 MHz
rBoot Option: Big flash
rBoot Option: RTC data

Booting rom 0.
pp_task_hdl : 3ffefd20, prio:14, stack:512
pm_task_hdl : 3ffef780, prio:1, stack:176
frc2_timer_task_hdl:0x3fff4228, prio:12, stack:200

ESP-Open-SDK ver: 0.0.1 compiled @ Jan 16 2018 09:11:40
phy ver: 273, pp ver: 8.3

Edit Jan 16 2018: Updated wording of pairing vs sessions

renssies commented 6 years ago

Update: Our doorbell just rang, so everyone rushed to their phone to open the door (the door lock is in HomeKit). This caused another high number of sessions started. But this time it did not crash. It looks like the crash only occurs after the device has just been set up.

maximkulkin commented 6 years ago

The crash is probably due to out of memory: memory became too fragmented that it couldn’t allocate next buffer. Large number of clients is not what esp8266 can handle. One workaround we could try to implement is to substitute malloc with one that will put task to sleep for short duration if allocation fails to wait until LWIP finishes sending it’s send queue and frees memory. On Tue, Jan 16, 2018 at 03:13 Rens Verhoeven notifications@github.com wrote:

Update: Our doorbell just rang, so everyone rushed to their phone to open the door (the door lock is in HomeKit). This caused another high number of pairings. But this time it did not crash. It looks like the crash only occurs after the device has just been set up.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximkulkin/esp-homekit/issues/17#issuecomment-357928910, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAqIULu4PHPQGVl90V-de9KJETVVS67ks5tLIROgaJpZM4RfmTq .

HomeACcessoryKid commented 6 years ago

I propose to clarify the vocabulary here

I think you mean to say session-setup instead of pairing. if you agree, would you edit the above? It is rather confusing. (I know to read between the lines, but others might not)

HomeACcessoryKid commented 6 years ago

On a constructive level, the actual pairing is a painful process. How about rebooting the device after it got paired? (I mean the 100% paired, not the half-paired). Recovering is lightning fast anyway so no real harm is done, but memory and behaviour will be fresh born clean. If it crashes after that, we have fundamental issues to solve.

maximkulkin commented 6 years ago

@HomeACcessoryKid why reboot? Since we have fixed memory issues, after pairing is complete there is no memory overhead. All other operations do not require large continuous chunks of memory, so fragmentation is not a problem.

HomeACcessoryKid commented 6 years ago

OK, if you are confident it is rock solid... I withdraw my proposal