zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.55k stars 6.46k forks source link

System stack usage analysis code seems to be broken #9587

Closed jhedberg closed 5 years ago

jhedberg commented 6 years ago

It seems CONFIG_INIT_STACKS=y is no longer enough to enable logs. I've somehow missed commit 0fb9ea03043551c66e6f0ec14c4ef9a001255d7d from @ramakrishnapallala which adds dependencies to CONFIG_THREAD_STACK_INFO (why this option and not CONFIG_INIT_STACKS?) to hci_core.c. However now even when I enable both CONFIG_INIT_STACKS=y and CONFIG_THREAD_STACK_INFO=y nothing gets output for the system stacks when a disconnection happens (which is the trigger for this kind of logging). I've tried this with samples/bluetooth/peripheral.

ramakrishnapallala commented 6 years ago

@jhedberg which board you are using?

jhedberg commented 6 years ago

@ramakrishnapallala the one I tried was nrf51_pca10028, however this is likely reproducible on many (possibly all) boards

jhedberg commented 6 years ago

@ramakrishnapallala could you explain the distinction between INIT_STACKS and THREAD_STACK_INFO? Previously (before your patch) the code was only relying on INIT_STACKS.

ramakrishnapallala commented 6 years ago

@jhedberg Macro CONFIG_INIT_STACKS will init the thread stack with 0xaa which is part of kernel/include/kernel_structs.h (line : 218)

So the deprecated API k_call_stacks_analyze is using STACK_ANALYZE macro to dump the stack usage statistics of _main_stack, _idle_stack, _interrupt_stack and sys_work_q_stack. Please refer kernel/init.c (line 138). Note, here we are simply dumpling stack info of only the 4 system level stacks. other thread stacks created from the main thread are not checked.

Macro THREAD_STACK_INFO if enabled will add a member in the k_thread struct to store it's stack starting address and size.

ramakrishnapallala commented 6 years ago

@jhedberg I have tested the samples/bluetooth/peripheral on nrf52_pca10040 and I get the following logs. Let me know if you see any issue?


***** Booting Zephyr OS v1.7.99-10296-g99abc0b *****
[bt] [INF] hci_vs_init: HW Platform: Nordic Semiconductor (0x0002)
[bt] [INF] hci_vs_init: HW Variant: nRF52x (0x0002)
[bt] [INF] hci_vs_init: Firmware: Standard Bluetooth controller (0x00) Version 1.13 Build 0
[bt] [WRN] bt_init: No ID address. Expecting one to come from storage.
Bluetooth initialized
[bt] [WRN] keys_set: Unable to find deleted keys for 73:e8:ed:c9:e6:41 (random)
[bt] [WRN] keys_set: Unable to find deleted keys for 00:00:00:00:00:00 (public)
[bt] [INF] bt_dev_show_info: Identity: d3:67:cf:6d:d8:0c (random)
[bt] [INF] bt_dev_show_info: HCI: version 5.0 (0x09) revision 0x0000, manufacturer 0x05f1
[bt] [INF] bt_dev_show_info: LMP: version 5.0 (0x09) subver 0xffff
Advertising successfully started
ecc stack (real size 1324):     unused 448      usage 876 / 1324 (66 %)
prio recv thread stack (real size 748): unused 632      usage 116 / 748 (15 %)
Connected
recv thread stack (real size 2348):     unused 1992     usage 356 / 2348 (15 %)
Passkey for 73:e8:ed:c9:e6:41 (random): 915744
ecc stack (real size 1324):     unused 448      usage 876 / 1324 (66 %)
prio recv thread stack (real size 748): unused 484      usage 264 / 748 (35 %)
recv thread stack (real size 2348):     unused 1592     usage 756 / 2348 (32 %)
prio recv thread stack (real size 748): unused 448      usage 300 / 748 (40 %)
recv thread stack (real size 2348):     unused 400      usage 1948 / 2348 (82 %)
prio recv thread stack (real size 748): unused 448      usage 300 / 748 (40 %)
HCI (real size 2348):   unused 400      usage 1948 / 2348 (82 %)
HCI (real size 748):    unused 448      usage 300 / 748 (40 %)
HCI (real size 1324):   unused 448      usage 876 / 1324 (66 %)
HCI (real size 940):    unused 480      usage 460 / 940 (48 %)
HCI (real size 2048):   unused 1236     usage 812 / 2048 (39 %)
HCI (real size 320):    unused 272      usage 48 / 320 (15 %)
HCI (real size 1024):   unused 648      usage 376 / 1024 (36 %)
tx stack (real size 940):       unused 480      usage 460 / 940 (48 %)
Disconnected (reason 19)
recv thread stack (real size 2348):     unused 152      usage 2196 / 2348 (93 %)
prio recv thread stack (real size 748): unused 448      usage 300 / 748 (40 %)
Connected
recv thread stack (real size 2348):     unused 152      usage 2196 / 2348 (93 %)
Passkey for 88:79:7e:15:ce:73 (public): 039598
ecc stack (real size 1324):     unused 448      usage 876 / 1324 (66 %)
prio recv thread stack (real size 748): unused 448      usage 300 / 748 (40 %)
recv thread stack (real size 2348):     unused 152      usage 2196 / 2348 (93 %)
[bt] [ERR] bt_id_add: Failed to add IRK to controller
prio recv thread stack (real size 748): unused 448      usage 300 / 748 (40 %)
recv thread stack (real size 2348):     unused 152      usage 2196 / 2348 (93 %)
prio recv thread stack (real size 748): unused 448      usage 300 / 748 (40 %)
HCI (real size 2348):   unused 152      usage 2196 / 2348 (93 %)
HCI (real size 748):    unused 448      usage 300 / 748 (40 %)
HCI (real size 1324):   unused 448      usage 876 / 1324 (66 %)
HCI (real size 940):    unused 480      usage 460 / 940 (48 %)
HCI (real size 2048):   unused 1236     usage 812 / 2048 (39 %)
HCI (real size 320):    unused 272      usage 48 / 320 (15 %)
HCI (real size 1024):   unused 648      usage 376 / 1024 (36 %)
tx stack (real size 940):       unused 480      usage 460 / 940 (48 %)
Disconnected (reason 19)
recv thread stack (real size 2348):     unused 152      usage 2196 / 2348 (93 %)
jhedberg commented 6 years ago

@ramakrishnapallala what changes did you make to the prj.conf to get the above output?

Note that even with the above it's broken since all system stacks (7 of them!) are reported as "HCI" without any way to know what's what. That's only marginally better than no logs at all. Previously there would be proper names for the stacks in the logs.

ramakrishnapallala commented 6 years ago

Make sure you have following enabled: CONFIG_INIT_STACKS=y CONFIG_THREAD_MONITOR=y CONFIG_THREAD_STACK_INFO=y

Can you pint me the actual code where k_thread_for_each() is being called. I can have a look at the code.

jhedberg commented 6 years ago

@ramakrishnapallala it might be CONFIG_THREAD_MONITOR=y that I've been missing then.

The place printing those "HCI" logs is here:

https://github.com/zephyrproject-rtos/zephyr/blob/master/subsys/bluetooth/host/hci_core.c#L616

And that line came actually from your commit 0fb9ea0

If we can get this fixed, then I suppose all existing STACK_ANALYZE() calls in the Bluetooth code should be removed, since k_thread_foreach covers them as well.

ramakrishnapallala commented 6 years ago

@jhedberg at the moment there is no way to differentiate b/w thread names, but this can be done once the PR #9394 submitted by @nashif gets in. This allows us to name a thread and then we can dump thread name before STACK_ANALYZE().

nashif commented 6 years ago

still a bug?

jhedberg commented 6 years ago

still a bug?

Not the one I was originally reporting, but the fact that we can't make much use of the output (due to not being able to differentiate what's what) I'd still consider this as being broken. The referenced PR #9394 sounds like a way to fix it, but that doesn't seem to be planned for 1.13.

nashif commented 5 years ago

9394 merged