Open orgua opened 1 year ago
Observation: the crashes got more frequent with latest changes to the kernel module in the dev-branch. Before the testsuite run through sometimes. Probably related to too excessive freeing of objects on unloading subroutines. Just a guess. Starting-commit: 24. Oct 2022 -> 6d87b96f (one commit before kernel work begun) Ending-commit: 25. Oct 2022 -> 5c237a72 Change-Comparison
Fixing these crashes has highest priority. Even with warnings on the compiler is happy ATM, but maybe it is possible to learn more with sanitizers for memory, address and undefined behavior (asan, ubsan, ...) or debugging and looking at the crash dumps.
Kernel work on the beaglebone can be tedious, helpful tricks:
/etc/modules
to stop crashing at bootloading and unloading is currently done with these commands:
# reload index and load module
depmod -a
modprobe -a shepherd
# unload module
modprobe -rf shepherd
It also helps to have the main distro on the internal eMMC-Card and an emergency distro on sd-card.
(the sd-card distro can be copied to eMMC by activating the last line in /boot/uEnv.txt
-> cmdline=init=/usr/sbin/init-beagle-flasher
)
pssp 5.9 for kernel <= 5.4 pssp 6.0 for kernel >= 5.10 pssp 6.2 for kernel >= 6.1
Linux 5.10 RemoteProc driver expects INTC information to be provided in an INTC map structure. See Linux drivers/remoteproc/pru_rproc.h for details.
This commit adds structures pruss_int_map and pru_irq_rsc to include/pru_types.h.
Linux 5.10 pruss_int_map provides PRU INTC mapping information for system event --> channel --> host interrupt.
Linux 5.4 ch_map only provided system event --> channel mapping. ch_map will be deprecated in a later commit.
The Linux RemoteProc driver loads PRU cores slightly differently in Linux kernel 5.10 than it did in Linux 5.4:
am335x: Port to Linux 5.10 -> https://git.ti.com/cgit/pru-software-support-package/pru-software-support-package/commit/?id=8c961a4def32cb0194325d033c5621a717882b85
We are currently locked in with kernel 4.19. For normal testbed-usage there is no problem, but unittests remove and reload the module for every test (~ 100 tests), which crashes the system at a random point in time eventually. I already removed (some) possible leaks and start to suspect the dusty kernel itself. There are kernel version 5.4 and 5.10 available for the beaglebone, but some internal interfaces changed with 5.4 and more with 5.10, so the transition is not done with a simple switch.
At least one kernel-change also affects PRU-Code regarding the intc-ressource:
Another problem are changes in the API and include system. part of the problem seems to be documented here
UPDATE: crashes were caused by the shepherd kernel-module and are fixed now. Update to 5.10 or 5.15 needs to happen anyway, as the next ubuntu-release might not allow downgrading the kernel to 4.19 anymore.