Closed chengguizi closed 3 years ago
Tips for debugging:
qconfig
instead of menuconfig
as it supports global config name searchBasically, we could use objdump -S to extract from the compiled .elf file, and look at the IRQ and user stack dump, to determine the stack of function calls when the hard fault happens. It has been very helpful
I have realise that it has a lot to do with CONFIG_NSH_ARCHINIT=y
.
Previously, this config is not enable, which makes board_app_initialize()
not runned, during nsh start up.
The commit can be found at https://github.com/nusrobomaster/PX4-Autopilot/commit/bf6938272227e438e35c596c6c03bbb03dbdde80 .
Brief booting sequence:
stm32_boardinitialize()
(progress 'F')nx_start()
(this is persistent and never returns)Regarding stm32_boardinitialize
, it has logics to reset pins initial values, as well as configure LEDs
/************************************************************************************
* Name: stm32_boardinitialize
*
* Description:
* All STM32 architectures must provide the following entry point. This entry point
* is called early in the intitialization -- after all memory has been configured
* and mapped but before any devices have been initialized.
*
************************************************************************************/
nx_start()
boot sequenceIt will perform semapher initilisation, POSIX clock, timer, signal, message queue, pthread, clib etc. During which, up_initialize
and board_early_initialize
(provided CONFIG_BOARD_EARLY_INITIALIZE) is also called.
Then, it run nx_bringup()
, determined by CONFIG_USER_ENTRYPOINT
to run the program.
nx_pgworker
for page faultnx_workqueues
for the worker threadnx_create_initthread
for the initial thread
Lastly, it performs syslog initialisation (late), after which idle loop is entered: up_idle()
.nx_create_initthread
apart from the idle thread (CONFIG_INIT_ENTRYPOINT)It runs board_late_initialize
if configured (CONFIG_BOARD_LATE_INITIALIZE), otherwise the entrypoint will start after the sinfo("Starting init thread\n");
printout
Interestingly, if CONFIG_BOARD_LATE_INITIALIZE is configured, the board_late_initialize
will be run on a separate thread (more robust full thread, instead of the idle thread).
The rest of the stories should lies in nsh_main()
in a separate thread:
pid = nxtask_create("init", CONFIG_USERMAIN_PRIORITY,
CONFIG_USERMAIN_STACKSIZE,
(main_t)CONFIG_USER_ENTRYPOINT,
(FAR char * const *)NULL);
nsh_main()
It resides in platforms/nuttx/NuttX/apps/system/nsh/nsh_main.c
The boot sequence:
up_cxxinitialize()
if configurednsh_initialize()
nsh_consolemain()
on the SAME thread, and does not returnnsh_initialize()
boot sequence:
nsh_romfsetc()
to mount the /etc
filesystemCONFIG_NSH_ARCHINIT
is configured, (void)boardctl(BOARDIOC_INIT, 0);
is executed, which is effectively the board_app_initialize()
function, defined by the board src code.Update on the issue:
There seem to be two separate sources of errors:
CONFIG_DEBUG_ASSERTIONS
and CONFIG_PRIORITY_INHERITANCE
is turned on and CONFIG_SEM_PREALLOCHOLDERS
is 0. The way to solve is simple, simply turn off CONFIG_DEBUG_ASSERTIONS
and the rest of two settings remains. This assertion seems uncessary, refer details at issue https://github.com/nusrobomaster/PX4-Autopilot/issues/6
CONFIG_ARCH_FPU
and perhaps the missing CONFIG_PRIORITY_INHERITANCE
configI have not test reproduce the hard fault. However, with proper priority inheritence turned on and debug assertion turned off, everything seem to work as expected.
This could be verified by commit https://github.com/nusrobomaster/PX4-Autopilot/commit/2a5a6339d15e633b93a4563daf5a814dd622af66
Close this issue for now.
It happens on both Dev A and Dev C board.
Tested on commit https://github.com/nusrobomaster/PX4-Autopilot/commit/16c13a87701af4dc60e1e0041414d5545a2cec4f
The program will run into Hard Fault IRQ, and reboot shortly after. It can be always reproduced by doing the following in nsh:
Basically, if status is printed twice, it always triggers a hard fault.
In addition other than the hard fault, there are also semaphore related exceptions, in which debug assertion failed. The problem seems to be that some logic is running in IRQ which should not be. Exact reason is hard to track and know