CONFIG_NO_HZ=y possibly broken

sorear commented 6 years ago

Adding that to the Fedora defconfig results in a kernel which fails to boot (hangs with high cpu usage immediately after "clocksource: Switched to clocksource riscv_clocksource").

palmer-dabbelt commented 6 years ago

I tried the Fedora Kconfig and it appears to use a dummy console. If I add CONFIG_NO_HZ=y to my regular config then I can boot, but if I select CONFIG_NO_HZ_IDLE=y then I get a hang. Do you have a Kconfig that manifests your error in QEMU simulation (ie, without a PCIe graphics card)? Walking all the way across the lab to check if the machine boots is a headache :).

atishp04 commented 6 years ago

For me, it booted but hangs after "Starting logging: OK Starting mdev... sort: /sys/devices/platform/Fixed: No such file or directory modprobe: can't change directory to '/lib/modules': No such file or directory Initializing random number generator... done. Starting network... udhcpc (v1.24.2) started Sending discover... Sending select for 10.0.2.15... Lease of 10.0.2.15 obtained, lease time 86400 deleting routers adding dns 10.0.2.3 Starting dropbear sshd: OK"

Are you seeing the same hang ? I am seeing same hang for both CONFIG_NO_HZ=y or CONFIG_NO_HZ=y
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ_COMMON=y

However, it hangs during the boot for spike/smp combination (for both configuration). [ 0.020000] clocksource: Switched to clocksource riscv_clocksource [ 0.020000] NET: Registered protocol family 2 [ 0.020000] TCP established hash table entries: 16384 (order: 5, 131072 bytes) [ 0.020000] TCP bind hash table entries: 16384 (order: 6, 262144 bytes) [ 0.020000] TCP: Hash tables configured (established 16384 bind 16384) [ 0.020000] UDP hash table entries: 1024 (order: 3, 32768 bytes) [ 0.020000] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) [ 0.030000] NET: Registered protocol family 1 [ 0.050000] Unpacking initramfs... [ 0.080000] Initialise system trusted keyrings [ 0.080000] workingset: timestamp_bits=62 max_order=19 bucket_order=0 [ 21.020000] INFO: rcu_sched self-detected stall on CPU [ 21.020000] INFO: rcu_sched self-detected stall on CPU [ 21.020000] 3-...!: (698636 ticks this GP) idle=2b6/2/0 softirq=1/1 fqs=0 [ 21.020000] (t=2100 jiffies g=-299 c=-300 q=1376) [ 21.020000] rcu_sched kthread starved for 2100 jiffies! g18446744073709551317 c18446744073709551316 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 [ 21.020000] rcu_sched I 0 8 2 0x00000000 [ 21.020000] Call Trace: [ 21.020000] [< (ptrval)>] __schedule+0x1c6/0x4ea [ 21.020000] [< (ptrval)>] schedule+0x16/0x58 [ 21.020000] [< (ptrval)>] schedule_timeout+0x12a/0x248 [ 21.020000] [< (ptrval)>] rcu_gp_kthread+0x31e/0x6cc [ 21.020000] [< (ptrval)>] kthread+0xcc/0xde [ 21.020000] [< (ptrval)>] ret_from_syscall+0xa/0xe [ 21.020000] Task dump for CPU 2: [ 21.020000] swapper/2 R running task 0 0 1 0x00000008 [ 21.020000] Call Trace: [ 21.020000] [< (ptrval)>] __schedule+0x1c6/0x4ea [ 21.020000] Task dump for CPU 3: [ 21.020000] swapper/3 R running task 0 0 1 0x00000008 [ 21.020000] Call Trace: [ 21.020000] [< (ptrval)>] walk_stackframe+0x0/0xa2 [ 21.020000] [< (ptrval)>] show_stack+0x26/0x34 [ 21.020000] [< (ptrval)>] sched_show_task+0xa6/0xfc [ 21.020000] [< (ptrval)>] dump_cpu_task+0x3a/0x48 [ 21.020000] [< (ptrval)>] rcu_dump_cpu_stacks+0x7a/0xb8 [ 21.020000] [< (ptrval)>] rcu_check_callbacks+0x4a0/0x660 [ 21.020000] [< (ptrval)>] update_process_times+0x1e/0x48 [ 21.020000] [< (ptrval)>] tick_nohz_handler+0x7c/0xd2 [ 21.020000] [< (ptrval)>] riscv_timer_interrupt+0x26/0x32 [ 21.020000] [< (ptrval)>] riscv_intc_irq+0xaa/0xd2 [ 21.020000] [< (ptrval)>] ret_from_syscall+0xa/0xe [ 21.030000] 2-...!: (698635 ticks this GP) idle=31a/2/0 softirq=1/1 fqs=0 [ 21.040000] (t=2102 jiffies g=-299 c=-300 q=1376) [ 21.040000] rcu_sched kthread starved for 2102 jiffies! g18446744073709551317 c18446744073709551316 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=3 [ 21.040000] rcu_sched R running task 0 8 2 0x00000000

I am running with fairly latest linux commit 8fa96b8877861fbcc7340f8f0a975569da254d93 Author: Palmer Dabbelt palmer@dabbelt.com Date: Wed Feb 7 18:23:09 2018 -0800

irq-riscv-intc post-merge cleanups

Should I update anything else to verify?

palmer-dabbelt commented 6 years ago

The difference is probably just my working directory being a mess, but I can reproduce some failures with

$ git diff
diff --git a/Makefile b/Makefile
index 408a4247fe64..6375cab35974 100644
--- a/Makefile
+++ b/Makefile
@@ -130,6 +130,7 @@ endif

 $(vmlinux): $(linux_srcdir) $(linux_wrkdir)/.config $(buildroot_initramfs_sysroot_stamp)
        $(MAKE) -C $< O=$(linux_wrkdir) \
+               CROSS_COMPILE=riscv64-unknown-linux-gnu- \
                CONFIG_INITRAMFS_SOURCE="$(confdir)/initramfs.txt $(buildroot_initramfs_sysroot)" \
                CONFIG_INITRAMFS_ROOT_UID=$(shell id -u) \
                CONFIG_INITRAMFS_ROOT_GID=$(shell id -g) \
diff --git a/conf/linux_defconfig b/conf/linux_defconfig
index eff1c1731aa8..e0aac738869a 100644
--- a/conf/linux_defconfig
+++ b/conf/linux_defconfig
@@ -83,3 +83,6 @@ CONFIG_MTD=y
 CONFIG_MTD_SPI_NOR=y
 CONFIG_MTD_M25P80=y
 CONFIG_MTD_BLOCK_RO=y
+CONFIG_NO_HZ=y
+CONFIG_NO_HZ_IDLE=y
+CONFIG_NO_HZ_COMMON=y
diff --git a/linux b/linux
index 8fa96b887786..632fc702e0ef 160000
--- a/linux
+++ b/linux
@@ -1 +1 @@
-Subproject commit 8fa96b8877861fbcc7340f8f0a975569da254d93
+Subproject commit 632fc702e0efd519bf75f55c7f85ac7300e08dab
diff --git a/riscv-gnu-toolchain b/riscv-gnu-toolchain
--- a/riscv-gnu-toolchain
+++ b/riscv-gnu-toolchain
@@ -1 +1 @@
-Subproject commit b4dae89f85bf882852c6186b1284df11065bfcd9
+Subproject commit b4dae89f85bf882852c6186b1284df11065bfcd9-dirty
diff --git a/riscv-qemu b/riscv-qemu
index cd2c21d70361..6256f8ac6004 160000
--- a/riscv-qemu
+++ b/riscv-qemu
@@ -1 +1 @@
-Subproject commit cd2c21d70361f6887e265860c4f0dbb49ad323ca
+Subproject commit 6256f8ac6004677cc4652b1fcd77c839dfc05015-dirty

so I'm going to go figure out what's going on.

palmer-dabbelt commented 6 years ago

@atishp04 is going to take a look, thanks!

atishp04 commented 6 years ago

Just a quick update. I am able to get repeated stall warnings in qemu setup as well (with smp).

[ 21.020000] INFO: rcu_sched self-detected stall on CPU [ 21.020000] INFO: rcu_sched self-detected stall on CPU [ 21.020000] 3-...!: (698636 ticks this GP) idle=2b6/2/0 softirq=1/1 fqs=0 [ 21.020000] (t=2100 jiffies g=-299 c=-300 q=1376) [ 21.020000] rcu_sched kthread starved for 2100 jiffies! g18446744073709551317 c18446744073709551316 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1 [ 21.020000] rcu_sched I 0 8 2 0x00000000 [ 21.020000] Call Trace: [ 21.020000] [< (ptrval)>] __schedule+0x1c6/0x4ea [ 21.020000] [< (ptrval)>] schedule+0x16/0x58 [ 21.020000] [< (ptrval)>] schedule_timeout+0x12a/0x248 [ 21.020000] [< (ptrval)>] rcu_gp_kthread+0x31e/0x6cc [ 21.020000] [< (ptrval)>] kthread+0xcc/0xde [ 21.020000] [< (ptrval)>] ret_from_syscall+0xa/0xe [ 21.020000] Task dump for CPU 2: [ 21.020000] swapper/2 R running task 0 0 1 0x00000008 [ 21.020000] Call Trace: [ 21.020000] [< (ptrval)>] schedule+0x1c6/0x4ea [ 21.020000] Task dump for CPU 3: [ 21.020000] swapper/3 R running task 0 0 1 0x00000008 [ 21.020000] Call Trace: [ 21.020000] [< (ptrval)>] walk_stackframe+0x0/0xa2 [ 21.020000] [< (ptrval)>] show_stack+0x26/0x34 [ 21.020000] [< (ptrval)>] sched_show_task+0xa6/0xfc [ 21.020000] [< (ptrval)>] dump_cpu_task+0x3a/0x48 [ 21.020000] [< (ptrval)>] rcu_dump_cpu_stacks+0x7a/0xb8 [ 21.020000] [< (ptrval)>] rcu_check_callbacks+0x4a0/0x660 [ 21.020000] [< (ptrval)>] update_process_times+0x1e/0x48 [ 21.020000] [< (ptrval)>] tick_nohz_handler+0x7c/0xd2 [ 21.020000] [< (ptrval)>] riscv_timer_interrupt+0x26/0x32 [ 21.020000] [< (ptrval)>] riscv_intc_irq+0xaa/0xd2 [ 21.020000] [< (ptrval)>] ret_from_syscall+0xa/0xe [ 21.030000] 2-...!: (698635 ticks this GP) idle=31a/2/0 softirq=1/1 fqs=0 [ 21.040000] (t=2102 jiffies g=-299 c=-300 q=1376) [ 21.040000] rcu_sched kthread starved for 2102 jiffies! g18446744073709551317 c18446744073709551316 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=3 [ 21.040000] rcu_sched R running task 0 8 2 0x00000000 [ 21.040000] Call Trace: [ 21.040000] [< (ptrval)>] schedule+0x1c6/0x4ea [ 21.040000] [< (ptrval)>] schedule+0x16/0x58 [ 21.040000] [< (ptrval)>] schedule_timeout+0x12a/0x248 [ 21.040000] [< (ptrval)>] rcu_gp_kthread+0x31e/0x6cc [ 21.040000] [< (ptrval)>] kthread+0xcc/0xde [ 21.040000] [< (ptrval)>] ret_from_syscall+0xa/0xe [ 21.040000] Task dump for CPU 2: [ 21.040000] swapper/2 R running task 0 0 1 0x00000008 [ 21.040000] Call Trace: [ 21.040000] [< (ptrval)>] walk_stackframe+0x0/0xa2 [ 21.040000] [< (ptrval)>] show_stack+0x26/0x34 [ 21.040000] [< (ptrval)>] sched_show_task+0xa6/0xfc [ 21.050000] [< (ptrval)>] dump_cpu_task+0x3a/0x48 [ 21.050000] [< (ptrval)>] rcu_dump_cpu_stacks+0x7a/0xb8 [ 21.050000] [< (ptrval)>] rcu_check_callbacks+0x4a0/0x660 [ 21.050000] [< (ptrval)>] update_process_times+0x1e/0x48 [ 21.050000] [< (ptrval)>] tick_nohz_handler+0x7c/0xd2 [ 21.050000] [< (ptrval)>] riscv_timer_interrupt+0x26/0x32 [ 21.050000] [< (ptrval)>] riscv_intc_irq+0xaa/0xd2 [ 21.050000] [< (ptrval)>] ret_from_syscall+0xa/0xe [ 21.050000] Task dump for CPU 3: [ 21.050000] swapper/3 R running task 0 0 1 0x00000008 [ 21.050000] Call Trace: [ 21.050000] [< (ptrval)>] __schedule+0x1c6/0x4ea

The first one is after 21sec and rest of them keeps appearing after each ~63 secs. Still investigating.

atishp04 commented 6 years ago

Got some time to look at the issue. It looks like it's a self detected stall on one or more cpus where rcu_sched kthread couldn't run for more than 21 seconds. All other cpus seems to run the idle thread (swapper task).

arch_idle() function invokes wfi which puts the cpu to until "idle" mode until the next interrupt happens. In periodic tick, the cpus are woken up by periodic timer tick. However, the no_hz idle case it should be only woken up when there is some task queued i.e. scheduler IPIs.

Here are possible causes: 1.All of the cpus are in idle. Thus nobody is forwarding the jiffies - This is not the case. We can see repeated stalls have correct jiffies.

The concerned cpu never returned from WFI after IPI. I am still investigating this cause. Doesn't seem likely because I have qemu logs which suggest the cpus do return from WFI after IPI.
The concerned cpu immediately returns from WFI thus looping continuously without giving any other thread chance to run.
My initial assumption was something wrong with timer_probe and clock source being initialized for each cpu. But that has been fixed with following patches.

http://lists.infradead.org/pipermail/linux-riscv/2018-April/000510.html

It is difficult to debug as I can't put printk in the arch_idle function as it is called so many times that serial console will hang in the first place. trace_printk is of no help as it is stuck during the boot itself. I am not aware of any method to access the trace buffer unless it's booted completely.

Looking for any ideas or suggestions to debug the problem further.

sorear commented 6 years ago

Add an instruction or CSR to qemu which does printf on a register, then debug by doing csr writes in arch_idle? If you're feeling ambitious implement Liviu's semihosting proposal (already in openocd and should be findable on sw-dev, it's a small change to EBREAK translation)

atishp04 commented 6 years ago

Fixed in https://github.com/atishp04/riscv-linux/commit/fd1e893746c37b58cb276eadd7d9ba9466ea14aa

We can close this issue now.

riscvarchive / riscv-linux

CONFIG_NO_HZ=y possibly broken #127