linux-automation / meta-lxatac

Build your own LXA TAC images and bundles
MIT License
5 stars 15 forks source link

The LXA TAC sometimes does not reboot correctly due to kernel hang #127

Open hnez opened 6 months ago

hnez commented 6 months ago

We have seen this issue when trying to reboot after a rauc install (but it is not clear if the rauc install plays a role or is just coincidence).

Here is a log of the log output during an attempted reboot, why was kindly recorded by @Bastian-Krause:

root@lxatac-00001:~# dmesg -n 7
root@lxatac-00001:~#
root@lxatac-00001:~#
root@lxatac-00001:~# [593755.976008] EXT4-fs (mmcblk1p2): mounted filesystem 7bd8e28e-fa40-41ea-b1ee-c2e5193ff824 r/w with ordered data mode. Quota mode: disabled.
[593757.532580] EXT4-fs (mmcblk1p2): unmounting filesystem 7bd8e28e-fa40-41ea-b1ee-c2e5193ff824.
[593764.707277] block nbd0: NBD_DISCONNECT
[593764.710349] block nbd0: Disconnected due to user request.
[593764.716062] block nbd0: shutting down sockets
[593764.722040] block nbd0: NBD_DISCONNECT
[593764.725182] block nbd0: Send disconnect failed -32
[593778.142757] watchdog: watchdog0: watchdog did not stop!
[593779.907858] EXT4-fs (mmcblk1p4): re-mounted 8ac00a5a-3255-4613-9119-dc58209566e8 ro. Quota �
[593779.957491] EXT4-fs (mmcblk1p4): re-mounted 8ac00a5a-3255-4613-9119-dc58209566e8 ro. Quota �
[593779.975127] EXT4-fs (mmcblk1p4): re-mounted 8ac00a5a-3255-4613-9119-dc58209566e8 ro. Quota �
[593779.993143] EXT4-fs (mmcblk1p4): re-mounted 8ac00a5a-3255-4613-9119-dc58209566e8 ro. Quota �
[593780.051262] EXT4-fs (mmcblk1p4): unmounting filesystem 8ac00a5a-3255-4613-9119
[593780.135084] EXT4-fs (mmcblk1p3): re-mounted 7bfa8c6d-c4f2-4091-a699-93d542d10ac2 ro. Quota �
[593780.398389] watchdog: watchdog0: nowayout prevents watchdog�
[593780.404425] systemd-shutdown[1]: Failed to disable hardware watchdog, ignoring: Device or�
[593780.413160] watchdog: watchdog0: nowayout prevents watchdog�
[593780.419264] watchdog: watchdog0: watchdo�
[593780.466408] ksz-switch spi0.0 uplink
[593784.562626] ksz-switch spi0.0 uplink: Link is Up - 1Gbps/Full - flo�
[594014.802931] INFO: task kworker/0:1:30966 blocked for more tha
[594014.810866]       Not tainted 6.7.�
[594014.814523] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
[594014.821492] task:kworker/0:1     state:D stack:0     pid:30966 tgid:30966 ppid:2      fl�
[594014.829993] Workqueue: ipv6_addrconf addrc�
[594014.834399]  __schedule from sch�
[594014.838060]  schedule from schedule_preempt_dis�
[594014.842977]  schedule_preempt_disabled from __mutex_lock.constpro�
[594014.849495]  __mutex_lock.constprop.0 from addrconf_verif�
[594014.855328]  addrconf_verify_work from process_one_wo
[594014.860626]  process_one_work from worker_thre�
[594014.865401]  worker_thread from kthre
594014.869290]  kthread from ret_from�
[594014.873065] Exception stack(0xe18f5fb0�
[594014.877034] 5fa0:                                     00000000 00000000 00�
[594014.884311] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00�
[594014.891611] 5fe0: 00000000 00000000 00000000 00000000 00�
[594014.897337] INFO: task kworker/1:1:30972 blocked for more tha
[594014.903480]       Not tainted 6.7.�[594014.907149] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables�
[594014.914114] task:kworker/1:1     state:D stack:0     pid:30972 tgid:30972 ppid:2      fl�
[594014.922609] Workqueue: events �[594014.925895]  __schedule from sch�
[594014.929373]  schedule from schedule_preempt_dis�
[594014.934248]  schedule_preempt_disabled from __mutex_lock.constpro�
[594014.940812]  __mutex_lock.constprop.0 from linkwatch_
[594014.946200]  linkwatch_event from process_one_w�
[594014.951142]  process_one_work from worker_thre�
[594014.955832]  worker_thread from kthre
[594014.959809]  kthread from ret_from�
[594014.963760] Exception stack(0xe1861fb0�
[594014.967946] 1fa0:                                     00000000 00000000 00�
[594014.975336] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00�
[594014.982712] 1fe0: 00000000 00000000 00000000 00000000 00��
NOTICE:  CPU: STM32MP157C?? Rev.Z
NOTICE:  Model: Linux Automation Test Automation Controller (TAC)
WARNING: VDD unknown
INFO:    Reset reason (0x214):
INFO:      IWDG2 Reset (rst_iwdg2)
INFO:    FCONF: Reading TB_FW firmware configuration file from: 0x2ffe2000
INFO:    FCONF: Reading firmware configuration information for: stm32mp_io
INFO:    Using EMMC
INFO:      Instance 2

It would also be interesting to know why log lines printed after the reboot process has started are a bit mangled and truncated.