maximkulkin / esp-homekit-demo

Demo of Apple HomeKit accessory server library
MIT License
809 stars 233 forks source link

On notify homekit crash #263

Closed nonameplum closed 4 years ago

nonameplum commented 4 years ago

example: garage door device: Sonoff SV Not always reproducible

 $ git submodule status
 516124460c9799ca87caa48ab4921504d2c802da components/common/homekit (heads/master-9-g5161244)
 3b4152bc66a324fdf097e9cc9564a8d67de5f97f components/common/wolfssl (remotes/origin/HEAD)
 49ccbab5a6d4c646b7911bf230d69c53bfda6715 components/esp-32/http-parser (remotes/origin/HEAD)
 e6e58537d3687a317566369b4b4a5186cad18f68 components/esp-8266/WS2812FX (heads/master)
 c56094c4c5bb92e33f7c019d548171bbe0497c36 components/esp-8266/led-status (heads/master)
 2bec68120204bfd683d9633138976f0d2c76c373 components/esp-8266/qrcode (heads/master)
 92cb44a3916cda956832abe58f6fc509301e1644 components/esp-8266/wifi_config (remotes/origin/HEAD)

Logs:

Notifying changed 'Current Door State'
Fatal exception (9):
epc1=0x4021be45

0x4021be45: write_characteristic_json at /Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:3149

epc2=0x00000000
epc3=0x40203699

0x40203699: prvIdleTask at /Volumes/esp8266/esp-open-rtos/FreeRTOS/Source/tasks.c:4626

excvaddr=0x0000000a
depc=0x00000000
excsave1=0x4021be45

0x4021be45: write_characteristic_json at /Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:3149

Registers:
a0 4021be45 a1 3fff79c0 a2  00000004 a3  000000ff

0x4021be45: write_characteristic_json at /Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:3149

a4  00000000 a5  0000001a a6  00000100 a7  00000001
a8  40241cfc a9  00000080 a10 40241cfd a11 0000003a

0x40241cfc: json_flush at ??:?

0x40241cfd: json_flush at ??:?

a12 3fff7bd8 a13 0000000a SAR 0000001b

Stack: SP=0x3fff79c0
0x3fff79c0: 00000100 00000011 00000006 00000013
0x3fff79d0: 00000100 00000001 3fff79c4 3fff3800
0x3fff79e0: 3fff31f0 3fff7bd8 3fff7bd8 3fff3800
0x3fff79f0: 3fff31f0 3fff7bd8 3fff7bd8 3fff3800
0x3fff7a00: 3fff31f0 3fff7bd8 3fff3368 4021c600

0x4021c600: send_client_events at /Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:3149

0x3fff7a10: 3fff3370 3fff31f0 3fff3368 4021e1ed

0x4021e1ed: homekit_server_process_notifications at /Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:3200

0x3fff7a20: 3fff3828 3fff23f0 00000000 00000006
0x3fff7a30: 3fff3828 3ffe9200 00000000 40105ee6

0x40105ee6: _free_r at /home/src/esp/newlib-cygwin/build/xtensa-lx106-elf/newlib/libc/stdlib/../../../../../newlib/libc/stdlib/nano-mallocr.c:433

Free Heap: 31354
_heap_start 0x3fff2878 brk 0x3fff9824 supervisor sp 0x40000000 sp-brk 26588 bytes
arena (total_size) 28588 fordblks (free_size) 4766 uordblocks (used_size) 23822

ets Jan  8 2013,rst cause:1, boot mode:(3,7)

load 0x40100000, len 2292, room 16

0x40100000: _stext at ??:?

tail 4
chksum 0x57
load 0x3ffe8000, len 772, room 4
tail 0
chksum 0x0b
csum 0x0b

rBoot v1.4.0 - richardaburton@gmail.com
Flash Size:   8 Mbit
Flash Mode:   DOUT
Flash Speed:  40 MHz
rBoot Option: Big flash
rBoot Option: RTC data

Booting rom 0.
pp_task_hdl : 3fff06c0, prio:14, stack:512
pm_task_hdl : 3ffeffb0, prio:1, stack:176
frc2_timer_task_hdl:0x3fff49f8, prio:12, stack:200

ESP-Open-SDK ver: 0.0.1 compiled @ Oct 12 2019 08:35:40
phy ver: 273, pp ver: 8.3
nonameplum commented 4 years ago

Additional info:

$ /Volumes/esp8266/esp-open-rtos/utils/addrsource 0x4021be45
0x4021be45 is in write_characteristic_json (/Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:419).
414     characteristic_format_events = (1 << 4),
415 } characteristic_format_t;
416 
417 
418 void write_characteristic_json(json_stream *json, client_context_t *client, const homekit_characteristic_t *ch, characteristic_format_t format, const homekit_value_t *value) {
419     json_string(json, "aid"); json_uint32(json, ch->service->accessory->id);
420     json_string(json, "iid"); json_uint32(json, ch->id);
421 
422     if (format & characteristic_format_type) {
423         json_string(json, "type"); json_string(json, ch->type);
Line 419 of "/Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c" starts at address 0x4021be42 <write_characteristic_json+30> and ends at 0x4021be51 <write_characteristic_json+45>.
$ /Volumes/esp8266/esp-open-rtos/utils/addrsource 0x4021c600
0x4021c600 is in send_client_events (/Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:821).
816 
817     client_event_t *e = events;
818     while (e) {
819         json_object_start(json);
820         write_characteristic_json(json, context, e->characteristic, 0, &e->value);
821         json_object_end(json);
822 
823         e = e->next;
824     }
825 
Line 821 of "/Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c" starts at address 0x4021c600 <send_client_events+116> and ends at 0x4021c605 <send_client_events+121>.
nonameplum commented 4 years ago

@maximkulkin Could you take a look if this might be a HomeKit bug? What I can do more to investigate the issue?

AchimPieters commented 4 years ago

@nonameplum there's no HomeKit bug. I've build several devices and the run all perfectly. I would recommend starting with a basic setup (eg LED example) and then work your way up.

nonameplum commented 4 years ago

I ran many examples. Just started getting this issue on garage example from time to time. I'm not sure what might be the reason.

AchimPieters commented 4 years ago

@nonameplum okay, so this issue happens only from time to time you say. What changes when you have this issue?

nonameplum commented 4 years ago

Seems like they might be some correlation of the timing as at that time few notifications goes to the HomeKit that door state has changed.

AchimPieters commented 4 years ago

@nonameplum I saw that you already have consulted @g3ntleman in #144 I wil have a look If i can find anything I will come back to you.

AchimPieters commented 4 years ago

@nonameplum So this is the hardware you use? index

and which power supply?

AchimPieters commented 4 years ago

@nonameplum https://github.com/arendst/Sonoff-Tasmota/issues/391

nonameplum commented 4 years ago

Yes this is exactly my device. I use 12VDC power supply.

nonameplum commented 4 years ago

@nonameplum arendst/Sonoff-Tasmota#391

@AchimPieters I suppose this is not related to my problem. I don't have AC in my case. My Sonoff SV is only connected to power source 12VDC and the relay that is on the board shorts the circuit in my case to give signal to garage door opener to open/close that's it. I have stable ~3.3V on gpio.

nonameplum commented 4 years ago

I also pushed to my fork the changes with an additional accessory (light) that should light on/off when the door are opened/closed. Seems like with my code I get the crash way more often as it has to send three homekit notifications at reed switch interrupt.

My code is here: https://github.com/nonameplum/esp-homekit-demo/tree/my_garage/examples/garage

I also configured by @maximkulkin advice udp logging https://github.com/maximkulkin/esp-homekit/issues/88#issuecomment-520604076

But I'm able to reproduce it both with device connected via UART or standalone with 12VDC and UDP logging.

nonameplum commented 4 years ago
Fatal exception (9):
epc1=0x4021d32b

0x4021d32b: homekit_server_process_notifications at /Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:3159

epc2=0x00000000
epc3=0x4020e63b

0x4020e63b: prvIdleTask at /Volumes/esp8266/esp-open-rtos/FreeRTOS/Source/tasks.c:4820

excvaddr=0x3fff3613
depc=0x00000000
excsave1=0x4021d326

0x4021d326: homekit_server_process_notifications at /Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:3159

Registers:
a0 4021d326 a1 3fff8690 a2  3fff3468 a3  3fff3613

0x4021d326: homekit_server_process_notifications at /Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:3159

a4  3fff346b a5  00000000 a6  3fff3613 a7  00000000
a8  00000003 a9  00000090 a10 00000000 a11 00000000
a12 3fff3468 a13 3fff3138 SAR 0000001a

Stack: SP=0x3fff8690
0x3fff8690: 3fff3613 3fff1f84 00000000 00000005
0x3fff86a0: ffffffff 3ffe9200 00000000 40105ee6

0x40105ee6: _free_r at /home/src/esp/newlib-cygwin/build/xtensa-lx106-elf/newlib/libc/stdlib/../../../../../newlib/libc/stdlib/nano-mallocr.c:433

0x3fff86b0: 00000001 3fff3138 3fff52f8 00000000
0x3fff86c0: 00000001 00000004 3fff2ff0 4021e8d1

0x4021e8d1: homekit_run_server at /Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:3267
 (inlined by) homekit_server_task at /Volumes/esp8266/esp-homekit-demo/components/common/homekit/src/server.c:3425

0x3fff86d0: b4150230 00000000 30303030 30303030
0x3fff86e0: 00000001 00000000 00000000 00000000
0x3fff86f0: 00000000 00000000 000000b6 00000000
0x3fff8700: 3fff2ff0 3fff3008 000000b6 00000000

Free Heap: 30356
_heap_start 0x3fff2520 brk 0x3fffa954 supervisor sp 0x40000000 sp-brk 22188 bytes
arena (total_size) 33844 fordblks (free_size) 8168 uordblocks (used_size) 25676
maccoylton commented 4 years ago

Exception 9 says unassigned address. Check the parameters you are passing too notify are correctly initialised and assigned.

nonameplum commented 4 years ago

It happens rarely but when current_door_state_set is called. (modified garage example https://github.com/nonameplum/esp-homekit-demo/blob/60437fa101ca7eabd1faf6cdd29b360c599ebe3e/examples/garage/garage.c#L265)

I put this part of code for the convenience:

...
homekit_characteristic_t current_door_state = HOMEKIT_CHARACTERISTIC_(
    CURRENT_DOOR_STATE, HOMEKIT_CHARACTERISTIC_CURRENT_DOOR_STATE_CLOSED,
    .getter=gdo_current_door_state_get,
    .setter=NULL
);
...

void gdo_current_door_state_notify_homekit() {
    LOG("");
    homekit_value_t new_value = HOMEKIT_UINT8(_current_door_state);
    LOG("!> Notifying homekit CURRENT DOOR state: '%s' [%s]", state_description(_current_door_state), current_door_state.description);
    homekit_characteristic_notify(&current_door_state, new_value);
}

void gdo_target_door_state_notify_homekit() {
    LOG("");
    homekit_value_t new_value = gdo_target_door_state_get();
    LOG("!> Notifying homekit TARGET DOOR state: '%s' [%s]", state_description(new_value.int_value), target_door_state.description);
    homekit_characteristic_notify(&target_door_state, new_value);
}

void current_door_state_set(uint8_t new_state) {
    LOG("New state: %d, old state: %d", new_state, _current_door_state);
    if (_current_door_state != new_state) {
        _current_door_state = new_state;
        gdo_target_door_state_notify_homekit();
        gdo_current_door_state_notify_homekit();
    }
}

If I go to the part of the homekit's framework server.c line 3159 as the crash log points to it refers to:

void homekit_server_process_notifications(homekit_server_t *server) {
    client_context_t *context = server->clients;
    while (context) {
        characteristic_event_t *event = NULL;
        if (xQueueReceive(context->event_queue, &event, 0)) {
            // Get and coalesce all client events
            client_event_t *events_head = malloc(sizeof(client_event_t));
            events_head->characteristic = event->characteristic; // here the crash potentially happen based on the crash log
            homekit_value_copy(&events_head->value, &event->value);
            events_head->next = NULL;

            homekit_value_destruct(&event->value);
            free(event);
            ...

Seems like event might be NULL or corrupted? Not correctly received from the xQueueReceive? I'm not sure if this is because of the bug on my side or the framework has an issue. @maximkulkin Could you take a look?

maximkulkin commented 4 years ago

Ok, this is a rare topic, but I have already seen that before. Exception 9 is for "unALIGNED" access. That means that you're probably trying to access 32bit value at address that is not 32bit aligned.

@nonameplum to answer your question: this is probably a framework problem, not yours, but since you have a (although rare) repro, I will need your help to debug that. Can you add printf statements into homekit_server_process_notification() function inside if (xQueueReceive()) { } block to print pointer values for event and events_head:

DEBUG("event = %p", event);

client_event_t *events_head = malloc(sizeof(client_event_t));
DEBUG("events_head = %p", events_head);

In meantime, I think that current garage door example is pretty messy and I had a cleaned up version laying around for quite some time, but I didn't merge it because I do not have setup to test it. If you could test it, I think it would be beneficial for everybody including you. I'm filing it in a separate branch called garage-door

nonameplum commented 4 years ago

No problem, I will put the logs and when I will get the crash again I will update this thread.

I will check the garage-door too.

nonameplum commented 4 years ago

@maximkulkin I checked the garage-door. There is an issue that the homekit app presents the opening/closing state without finally resolving into the open/close state. I checked it briefly. It might be that notification via homekit is sent only about ch_current_door_state instead of current and target characteristic as before. But I might be wrong.

Beside that I didn't get the crash yet again but the reason might be that I changed the implementation of contact_sensor to match esp-open-rtos interrupt recommended handling https://github.com/SuperHouse/esp-open-rtos/blob/master/examples/button/button.c via QueueHandle_t. Maybe this is the key here as both (current & target) homekit notifications were sent from the interrupt handler. Implementation is here: https://github.com/nonameplum/esp-homekit-demo/blob/my_garage/helpers/interrupt_gpio.c#L44

d4rkmen commented 4 years ago

Exception 9 is for "unALIGNED" access. That means that you're probably trying to access 32bit value at address that is not 32bit aligned.

IMHO this is the price for unofficial devkit. Some memory regions intended to be an 32-bit aligned instruction cache are used for heap which is not aligned. Need to test this on the latest official sdk

Actually, Im a bit dissapointed with rtos general stability. I have arduino esp projects running smooth for years nonstop. Need to try mongoose-os out.

maximkulkin commented 4 years ago

IMHO this is the price for unofficial devkit.

But can you smell freedom and openness?! :)

Some memory regions intended to be an 32-bit aligned instruction cache are used for heap which is not aligned.

You can align stuff yourself, no need for waiting when big daddy aligns it for you. Also, I think that if you allocate something, it is already aligned. I think it is more complicated than that. Also it was seen only in particular firmware which makes me think that it is something in user code.

Actually, Im a bit dissapointed with rtos general stability. I have arduino esp projects running smooth for years nonstop. Need to try mongoose-os out.

Good luck implementing cryptography on those! I bet you do not have HomeKit accessories running on Arduino Core.

RavenSystem commented 4 years ago

I only want to add that esp-open-rtos is VERY stable, and a single esp8266 chip using esp-homekit framework can support a dozen of different HomeKit accessories (even more) without any problem.

nonameplum commented 4 years ago

As we do offtopic already. Isn't for the future projects ESP-IDF (Apache 2.0 as far as I can see so license is not a problem) a better choice as is actively maintained opposed to esp-open-rtos?

nonameplum commented 4 years ago

Btw since I have made a change with the interrupt implementation my esp8266 is already working very stable and I didn't encounter any crash yet.

d4rkmen commented 4 years ago

@nonameplum, @RavenSystem, any good expirience with remote access? I have posted a new Issue

maximkulkin commented 4 years ago

@nonameplum Thanks for the update. Indeed, most button implementations execute callbacks right inside interrupt handler to avoid making it heavyweight in cases you do not need it (or need it done in a different way), so it is your responsibility to design callbacks to be lightweight and postpone all heavy lifting.

As of supporting official Espressif SDK for ESP8266: I had it done, but never pushed the change because it moved a lot of stuff in demo repository. I plan to revisit that, freshen it and push in near future.

d4rkmen commented 4 years ago

Good luck implementing cryptography on those!

Thanks, actually trying now mongoose-os with esp32-homekit I have used this library on ARM controllers with 50+ devices, and was happy. Will see how it runs on ESP...

maximkulkin commented 4 years ago

Thanks, actually trying now mongoose-os with esp32-homekit

Seems like project is stale

nonameplum commented 4 years ago

I think we can close this issue as I'm not getting the crash anymore. So I assume that the proper handling of the interrupt https://github.com/maximkulkin/esp-homekit-demo/issues/263#issuecomment-569747881 solves the crash.