openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.62k stars 1.75k forks source link

Kernel panic on running rsend_012_pos on sparc64 #12039

Open rincebrain opened 3 years ago

rincebrain commented 3 years ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version sid
Linux Kernel 5.10.0-6-sparc64
Architecture sparc64
ZFS Version 2babd2004

Describe the problem you're observing

Trying to run through ZTS for #12022, found that on vanilla git master (or my patched branch, for that matter), running the whole series of rsend tests will, when it gets to rsend_012_pos, for whatever reason, cause the kernel to crash and burn 100% of the time. (Unhelpfully, it fails to print a stacktrace - the full output to console is reproduced below.)

Sometimes, it's unhappy enough that the watchdog timer doesn't trigger and pressing break twice doesn't work to get back to prom, leaving you to physically power cycle it.

(It seems potentially relevant to mention this is a Netra T1 - so it's possible other Linux/SPARC64 hardware might not suffer from this? IDK, I do not know what's breaking right now.)

Describe how to reproduce the problem

`scripts/zfs-tests.sh -r rsend

Include any warning/errors/backtraces from the system logs

crash output to console:

[ 1435.191913] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1435.294939] CPU: 0 PID: 722 Comm: spl_system_task Tainted: P           OE     5.10.0-6-sparc64 #1 Debian 5.10.28-1
[ 1435.431126] Call Trace:
[ 1435.463267] Press Stop-A (L1-A) from sun keyboard or send break
[ 1435.463267] twice on console to return to the boot prom
[ 1435.609777] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---

RED State Exception

TL=0000.0000.0000.0005 TT=0000.0000.0000.0010
   TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
TL=0000.0000.0000.0004 TT=0000.0000.0000.0010
   TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
TL=0000.0000.0000.0003 TT=0000.0000.0000.0010
   TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
TL=0000.0000.0000.0002 TT=0000.0000.0000.0010
   TPC=0000.0000.0040.70d0 TnPC=0000.0000.0040.70d4 TSTATE=0000.0000.8004.1406
TL=0000.0000.0000.0001 TT=0000.0000.0000.0068
   TPC=0000.0000.0048.bba4 TnPC=0000.0000.0048.bba8 TSTATE=0000.0000.8000.1606

Watchdog Reset
Externally Initiated Reset

/proc/cpuinfo

$ cat /proc/cpuinfo
cpu             : TI UltraSparc IIi (Sabre)
fpu             : UltraSparc IIi integrated FPU
pmu             : ultra12
prom            : OBP 3.10.25 2000/01/17 21:26
type            : sun4u
ncpus probed    : 1
ncpus active    : 1
D$ parity tl1   : 0
I$ parity tl1   : 0
Cpu0ClkTck      : 000000001a3a4034
cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
MMU Type        : Spitfire
MMU PGSZs       : 8K,64K,512K,4MB
rincebrain commented 3 years ago

Oh boy, 4.15.0-2-sparc64 actually gave me a stacktrace...

[ 1004.096214] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1004.096214]
[ 1004.218708] CPU: 0 PID: 23350 Comm: spl_system_task Tainted: P           O     4.15.0-2-sparc64                                        #1 Debian 4.15.11-1
[ 1004.356022] Call Trace:
[ 1004.388180]  [00000000004668f0] panic+0xd0/0x280
[ 1004.448890]  [00000000009f8ccc] switch_to_pc+0x4f8/0x50c
[ 1004.518762]  [00000000009f8e9c] _cond_resched+0x3c/0x60
[ 1004.587500]  [00000000009fa06c] mutex_lock+0xc/0x40
[ 1004.652797]  [00000000108c968c] zio_wait_for_children+0xc/0xc0 [zfs]
[ 1004.736825]  [00000000108ca304] zio_vdev_io_done+0x24/0x200 [zfs]
[ 1004.817421]  [00000000108cb9b0] zio_execute+0x90/0x100 [zfs]
[ 1004.892274]  [000000001088a160] vdev_mirror_io_start+0x100/0x280 [zfs]
[ 1004.978602]  [00000000108cd008] zio_vdev_io_start+0x2c8/0x320 [zfs]
[ 1005.061473]  [00000000108cf674] zio_nowait+0xb4/0x140 [zfs]
[ 1005.135133]  [00000000107d54b8] arc_read+0xb58/0x1140 [zfs]
[ 1005.208761]  [00000000107e2c04] dbuf_issue_final_prefetch+0x84/0x100 [zfs]
[ 1005.299546]  [00000000107e87d8] dbuf_prefetch_indirect_done+0x1d8/0x200 [zfs]
[ 1005.393751]  [00000000107d5cf8] arc_read_done+0x258/0x440 [zfs]
[ 1005.472020]  [00000000108d16d0] zio_done+0x470/0xe40 [zfs]
[ 1005.544597]  [00000000108cb9b0] zio_execute+0x90/0x100 [zfs]
[ 1005.619062] Press Stop-A (L1-A) from sun keyboard or send break
[ 1005.619062] twice on console to return to the boot prom
[ 1005.765570] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1005.765570]
stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.