maximkulkin / esp-homekit

Apple HomeKit accessory server library for ESP-OPEN-RTOS
MIT License
1.1k stars 168 forks source link

Random Panic Crash. Around ~1 Every 2 Days #144

Closed mriksman closed 2 years ago

mriksman commented 4 years ago

Just letting the device sit there on its own to see how stable it is. Have seen this twice now;

I (132024601) main: HOMEKIT_EVENT_CLIENT_CONNECTED
>>> HomeKit: [Client 50] Pair Verify Step 1/2
>>> HomeKit: [Client 49] Get Accessories
Guru Meditation Error: Core  0 panic'ed (StoreProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x4000e143  PS      : 0x00000030  A0      : 0x4025b804  A1      : 0x3fff6ee0  
0x4025b804: __ssputs_r at /home/dongheng/workspace/espressif/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-lx106-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/nano-vfprintf.c:343

A2      : 0x00000000  A3      : 0x00000001  A4      : 0x00000003  A5      : 0x00000001  
A6      : 0x00000003  A7      : 0x3fff6fa1  A8      : 0x3fff6fa0  A9      : 0x00000000
A10     : 0x3fff6fa3  A11     : 0x00000033  A12     : 0x3fff7000  A13     : 0x3fff7000
A14     : 0x40107764  A15     : 0x00000003  SAR     : 0x0000001f  EXCCAUSE: 0x0000001d

Backtrace: 0x4000e143:0x3fff6ee0 0x4025b804:0x3fff6ee0 0x4025c4b0:0x3fff6f10 0x4025ba90:0x3fff6f60 0x4025d1a1:0x3fff7000 0x4021840e:0x3fff70c0 0x40264ca2:0x3fff70f0 0x4021c02b:0x3fff7100 0x4021c09a:0x3fff7160 0x4021c44f:0x3fff7170 0x40218fa2:0x3fff7190 0x4021ab51:0x3fff71c0 0x40220495:0x3fff71d0 0x4021aebc:0x3fff7230 Guru Meditation Error: Core  0 panic'ed (LoadStoreAlignment). Exception was unhandled.
0x4025b804: __ssputs_r at /home/dongheng/workspace/espressif/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-lx106-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/nano-vfprintf.c:343

0x4025c4b0: _printf_i at /home/dongheng/workspace/espressif/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-lx106-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/nano-vfprintf_i.c:244

0x4025ba90: _svfprintf_r at /home/dongheng/workspace/espressif/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-lx106-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/nano-vfprintf.c:641

0x4025d1a1: snprintf at /home/dongheng/workspace/espressif/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-lx106-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/snprintf.c:118

0x4021840e: client_send_chunk at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/server.c:3167

0x40264ca2: json_flush at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/json.c:70

0x4021c02b: json_write at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/json.c:354

0x4021c09a: _do_write at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/json.c:354

0x4021c44f: json_string at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/json.c:309

0x40218fa2: homekit_server_on_get_accessories at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/server.c:3167

0x4021ab51: homekit_server_on_message_complete at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/server.c:2960

0x40220495: http_parser_execute at C:/VSCode/SDK/esp8266/ESP8266_RTOS_SDK/components/http_parser/src/http_parser.c:2171

0x4021aebc: homekit_client_process at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/server.c:3067
 (inlined by) homekit_run_server at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/server.c:3288
 (inlined by) homekit_server_task at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/server.c:3452

Core 0 register dump:
PC      : 0x4023211b  PS      : 0x00000033  A0      : 0x40232099  A1      : 0x3ffe9510
0x4023211b: xt_retaddr_callee at C:/VSCode/SDK/esp8266/ESP8266_RTOS_SDK/components/esp8266/source/backtrace.c:83

0x40232099: xt_retaddr_callee at C:/VSCode/SDK/esp8266/ESP8266_RTOS_SDK/components/esp8266/source/backtrace.c:50

A2      : 0x3fff722d  A3      : 0x00000000  A4      : 0x00000298  A5      : 0x000000a0  
A6      : 0x00000090  A7      : 0x4021ac24  A8      : 0x3fff7231  A9      : 0x00000011
0x4021ac24: homekit_server_task at c:\vscode\projects\esp8266_homekit\build/../components/homekit/src/server.c:3406

A10     : 0x000000c0  A11     : 0x000007fe  A12     : 0x4025b804  A13     : 0x3ffe9544
0x4025b804: __ssputs_r at /home/dongheng/workspace/espressif/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-lx106-elf/newlib/libc/stdio/../../../.././newlib/libc/stdio/nano-vfprintf.c:343

A14     : 0x3fff7230  A15     : 0x3ffe9540  SAR     : 0x0000001f  EXCCAUSE: 0x00000009

Backtrace: 0x4023211b:0x3ffe9510 0x40215bdf:0x3ffe9540 0x40100538:0x3ffe9580
0x4023211b: xt_retaddr_callee at C:/VSCode/SDK/esp8266/ESP8266_RTOS_SDK/components/esp8266/source/backtrace.c:83

0x40215bdf: panic_frame at C:/VSCode/SDK/esp8266/ESP8266_RTOS_SDK/components/freertos/port/esp8266/panic.c:184
 (inlined by) panicHandler at C:/VSCode/SDK/esp8266/ESP8266_RTOS_SDK/components/freertos/port/esp8266/panic.c:150

0x40100538: __mulsf3_aux at /home/wjg/Repo/esp-open-sdk/crosstool-NG/.build/src/gcc-4.8.2/libgcc/config/xtensa/ieee754-sf.S:498

Anyone have any ideas?

mriksman commented 4 years ago

From the docs;

LoadProhibited, StoreProhibited This CPU exception happens when application attempts to read from or write to an invalid memory location. The address which was written/read is found in EXCVADDR register in the register dump. If this address is zero, it usually means that application attempted to dereference a NULL pointer. If this address is close to zero, it usually means that application attempted to access member of a structure, but the pointer to the structure was NULL

mriksman commented 4 years ago

Somewhere around the snprintf or memcpy in the client_send_chunk function?

void client_send_chunk(byte *data, size_t size, void *arg) {
    client_context_t *context = arg;

    size_t payload_size = size + 8;
    byte *payload = malloc(payload_size);

    int offset = snprintf((char *)payload, payload_size, "%x\r\n", size);
    memcpy(payload + offset, data, size);
    payload[offset + size] = '\r';
    payload[offset + size + 1] = '\n';

    client_send(context, payload, offset + size + 2);

    free(payload);
}

There are a few calls in server.c that have

client_send_chunk(NULL, 0, context);

But then, wouldn't memcpy be trying to use a NULL pointer for the src parameter?

Granted, my knowledge isn't great with programming, but this seems strange...?

mriksman commented 4 years ago

I added a line in between the sprintf and memcpy to print out the address pointed to by data. Indeed, it has NULL many times, but it doesn't usually cause a crash...

>>> HomeKit: [Client 55] Get Accessories
>>> HomeKit: Data pointer 0x4010b498
>>> HomeKit: Data pointer 0x4010b498
>>> HomeKit: Data pointer 0x4010b498
>>> HomeKit: Data pointer 0
maximkulkin commented 4 years ago

I think you're running out of memory. When I was implementing this framework there was a consideration what to do when there is not enough memory. Back in the days I decided not to do anything (and thus it will just crash). Since I saw more of this nowadays, I'm looking to optimize memory usage (and memory fragmentation), but it looks like putting a check after every malloc will make compiled framework size bigger, so I'm still hesitant to merge those checks (you can check my "memory-handling-improvements" branch and see if it makes it clearer what goes wrong.

mriksman commented 4 years ago

As in HEAP? I check this, and it doesn’t seem to be dropping. Hovers around 30000. After 12 hours;

HEAP 30360
sys_evt         X       10      1760    4
Tmr Svc         R       2       1064    3
IDLE            R       0       784     2
HomeKit Server  B       1       7768    8
tiT             B       8       1388    5
ppT             B       13      648     6
mdns            B       1       2192    10
pmT             B       11      624     7

The second-last column is the 'high-water mark' which indicates the lowest free task stack space it has reached. HomeKit Server seems to have plenty.

I'll keep monitoring it.

beckmx commented 4 years ago

I have realized that the memory ends at some point, but is randomly happening, it does run out of memory, it just cannot allocate enough memory and keeps looping in the methods that can run, specifically the methods I noticed die because of lack of memory are the mDNS. So as Maxim says, if you run out of ram, it is complex to free more memory due to the fragmentation done during its lifetime. In my case I decided to reboot the chip specifically in the mDNS library because is the one detecting the problems with memory allocation

maximkulkin commented 4 years ago

I'm constantly try to improve stability of esp-homekit framework. There is an experimental branch "memory-handling-improvements" that has work on optimizing memory usage and detect memory issues as they occur. I'm still haven't decided if all changes in this branch need to make it into master, but it might help your case. Check it out and see if it improves your experience.

mriksman commented 4 years ago

Leave the case open for now - I'll be setting up a few to test in the coming days/weeks, and will test both branches.

mriksman commented 2 years ago

I no longer have any issues with memory and random crashes.

maximkulkin commented 1 year ago

Thanks for confirming that!