openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.69k stars 1.75k forks source link

git master panics during ZTS on i686 #12029

Open rincebrain opened 3 years ago

rincebrain commented 3 years ago

System information

Type Version/Name
Distribution Name Debian
Distribution Version 10 (buster)
Linux Kernel 4.19.0-16-686-pae
Architecture x86
ZFS Version 2.1.99-160_g38c6d6ced

Describe the problem you're observing

While testing my refinement to #12022, I was concerned to discover that it hung ~forever (I tried waiting an hour) on my i686 VM.

...so I tried with vanilla git master, and it BUG_ON'd instead!

Describe how to reproduce the problem

Run ZTS with the sanity runfile. (zfs_destroy/zfs_destroy_dev_removal is the last test it logged, as FAIL, before this happened. The process that's now hung forever is zfs_destroy/zfs_destroy_dev_removal_condense)

Include any warning/errors/backtraces from the system logs

[  277.834720] BUG: unable to handle kernel NULL pointer dereference at 0000002d
[  277.835437] *pdpt = 0000000000000000 *pde = f000ff53f000ff53 
[  277.836155] Oops: 0000 [#1] SMP PTI
[  277.836838] CPU: 0 PID: 614 Comm: z_vdev_file Tainted: P           OE     4.19.0-16-686-pae #1 Debian 4.19.181-1
[  277.838288] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  277.839018] EIP: spl_kmem_cache_alloc+0x1e/0x710 [spl]
[  277.839692] Code: ff ff e8 a5 8d 29 ce 8d 74 26 00 90 3e 8d 74 26 00 55 89 e5 57 56 89 c6 53 83 ec 38 89 55 cc 65 a1 14 00 00 00 89 45 f0 31 c0 <f6> 46 2d 01 0f 84 80 00 00 00 8b 46 28 89 d3 83 e3 04 89 45 d8 89
[  277.841707] EAX: 00000000 EBX: 00006d04 ECX: f2c2cb68 EDX: 00000004
[  277.842389] ESI: 00000000 EDI: 00da0a00 EBP: ec74feac ESP: ec74fe68
[  277.843049] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[  277.843704] CR0: 80050033 CR2: 0000002d CR3: 06a54000 CR4: 000406f0
[  277.844506] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  277.845128] DR6: fffe0ff0 DR7: 00000400
[  277.845742] Call Trace:
[  277.846395]  ? __raw_callee_save___pv_queued_spin_unlock+0x9/0x10
[  277.846987]  ? __raw_callee_save___pv_queued_spin_unlock+0x9/0x10
[  277.847584]  ? abd_verify_scatter+0x28/0x30 [zfs]
[  277.848157]  zio_buf_alloc+0x28/0x60 [zfs]
[  277.848722]  abd_borrow_buf+0x37/0x40 [zfs]
[  277.849296]  vdev_file_io_strategy+0xb5/0x110 [zfs]
[  277.849837]  taskq_thread+0x295/0x4f0 [spl]
[  277.850389]  ? wake_up_q+0x70/0x70
[  277.850903]  kthread+0xf0/0x110
[  277.851411]  ? taskq_thread_spawn+0x50/0x50 [spl]
[  277.851903]  ? kthread_bind+0x30/0x30
[  277.852390]  ret_from_fork+0x2e/0x38
[  277.852854] Modules linked in: loop zfs(POE) icp(POE) zzstd(OE) zlua(OE) zcommon(POE) zunicode(POE) znvpair(POE) zavl(POE) spl(OE) crc32_pclmul vmwgfx ttm drm_kms_helper joydev evdev drm intel_rapl_perf pcspkr serio_raw sg vboxguest ac video binfmt_misc button sunrpc ip_tables x_tables autofs4 hid_generic ext4 crc16 mbcache jbd2 usbhid crc32c_generic hid fscrypto ecb sr_mod cdrom sd_mod ata_generic crc32c_intel ohci_pci ahci ata_piix libahci ohci_hcd psmouse aesni_intel ehci_pci aes_i586 ehci_hcd libata crypto_simd cryptd usbcore i2c_piix4 scsi_mod e1000 usb_common
[  277.855945] CR2: 000000000000002d
[  277.856408] ---[ end trace 456a4f8cfc007104 ]---
[  277.856858] EIP: spl_kmem_cache_alloc+0x1e/0x710 [spl]
[  277.857313] Code: ff ff e8 a5 8d 29 ce 8d 74 26 00 90 3e 8d 74 26 00 55 89 e5 57 56 89 c6 53 83 ec 38 89 55 cc 65 a1 14 00 00 00 89 45 f0 31 c0 <f6> 46 2d 01 0f 84 80 00 00 00 8b 46 28 89 d3 83 e3 04 89 45 d8 89
[  277.858704] EAX: 00000000 EBX: 00006d04 ECX: f2c2cb68 EDX: 00000004
[  277.859167] ESI: 00000000 EDI: 00da0a00 EBP: ec74feac ESP: c6a5cdfc
[  277.859615] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[  277.860068] CR0: 80050033 CR2: 0000002d CR3: 06a54000 CR4: 000406f0
[  277.860508] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  277.860935] DR6: fffe0ff0 DR7: 00000400

edit: I ran it a second time to see if it was a fluke, and on the same test, it spit out:

[  263.635837] BUG: unable to handle kernel NULL pointer dereference at 0000002d
[  263.636594] *pdpt = 0000000000000000 *pde = f000ff53f000ff53 
[  263.637328] Oops: 0000 [#1] SMP PTI
[  263.638043] CPU: 4 PID: 615 Comm: z_vdev_file Tainted: P           OE     4.19.0-16-686-pae #1 Debian 4.19.181-1
[  263.639476] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  263.640181] EIP: spl_kmem_cache_alloc+0x1e/0x710 [spl]
[  263.640869] Code: ff ff e8 a5 dd 27 e6 8d 74 26 00 90 3e 8d 74 26 00 55 89 e5 57 56 89 c6 53 83 ec 38 89 55 cc 65 a1 14 00 00 00 89 45 f0 31 c0 <f6> 46 2d 01 0f 84 80 00 00 00 8b 46 28 89 d3 83 e3 04 89 45 d8 89
[  263.642938] EAX: 00000000 EBX: 00005604 ECX: f2f53d28 EDX: 00000004
[  263.643619] ESI: 00000000 EDI: 00ac0a00 EBP: ec79beac ESP: ec79be68
[  263.644280] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[  263.644944] CR0: 80050033 CR2: 0000002d CR3: 1ea54000 CR4: 000406f0
[  263.645604] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  263.646256] DR6: fffe0ff0 DR7: 00000400
[  263.646886] Call Trace:
[  263.647499]  ? __switch_to_asm+0x28/0x50
[  263.648095]  ? __switch_to_asm+0x34/0x50
[  263.648684]  ? __raw_callee_save___pv_queued_spin_unlock+0x9/0x10
[  263.649258]  ? finish_task_switch+0x65/0x250
[  263.649843]  ? abd_verify_scatter+0x28/0x30 [zfs]
[  263.650437]  zio_buf_alloc+0x28/0x60 [zfs]
[  263.650982]  abd_borrow_buf+0x37/0x40 [zfs]
[  263.651528]  vdev_file_io_strategy+0xb5/0x110 [zfs]
[  263.652047]  taskq_thread+0x295/0x4f0 [spl]
[  263.652565]  ? wake_up_q+0x70/0x70
[  263.653055]  kthread+0xf0/0x110
[  263.653541]  ? taskq_thread_spawn+0x50/0x50 [spl]
[  263.654009]  ? kthread_bind+0x30/0x30
[  263.654503]  ret_from_fork+0x2e/0x38
[  263.654944] Modules linked in: loop zfs(POE) icp(POE) zzstd(OE) zlua(OE) zcommon(POE) zunicode(POE) znvpair(POE) zavl(POE) spl(OE) crc32_pclmul binfmt_misc vmwgfx ttm joydev drm_kms_helper evdev video sg intel_rapl_perf drm serio_raw vboxguest pcspkr ac button sunrpc ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb hid_generic usbhid hid sr_mod sd_mod cdrom ata_generic crc32c_intel ohci_pci ata_piix ahci ohci_hcd libahci aesni_intel ehci_pci ehci_hcd aes_i586 libata crypto_simd e1000 usbcore cryptd psmouse scsi_mod i2c_piix4 usb_common
[  263.657729] CR2: 000000000000002d
[  263.658224] ---[ end trace 25da9fdd08fa4f26 ]---
[  263.658678] EIP: spl_kmem_cache_alloc+0x1e/0x710 [spl]
[  263.659134] Code: ff ff e8 a5 dd 27 e6 8d 74 26 00 90 3e 8d 74 26 00 55 89 e5 57 56 89 c6 53 83 ec 38 89 55 cc 65 a1 14 00 00 00 89 45 f0 31 c0 <f6> 46 2d 01 0f 84 80 00 00 00 8b 46 28 89 d3 83 e3 04 89 45 d8 89
[  263.660494] EAX: 00000000 EBX: 00005604 ECX: f2f53d28 EDX: 00000004
[  263.660940] ESI: 00000000 EDI: 00ac0a00 EBP: ec79beac ESP: dea5cdfc
[  263.661387] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[  263.661820] CR0: 80050033 CR2: 0000002d CR3: 1ea54000 CR4: 000406f0
[  263.662277] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  263.662706] DR6: fffe0ff0 DR7: 00000400

Bonus: a BUG_ON from kvm, this time from running zfs_destroy_dev_removal. (It looked similar enough to me to be relevant; LMK if I'm mistaken and should file a separate bug.)

[10048.540460] BUG: unable to handle kernel NULL pointer dereference at 0000002d
[10048.543684] *pdpt = 0000000000000000 *pde = f000ff53f000ff53 
[10048.546481] Oops: 0000 [#1] SMP PTI
[10048.548552] CPU: 3 PID: 2984 Comm: z_vdev_file Tainted: P           OE     4.19.0-16-686-pae #1 Debian 4.19.181-1
[10048.550357] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[10048.552177] EIP: spl_kmem_cache_alloc+0x1e/0x710 [spl]
[10048.554293] Code: ff ff e8 a5 ad 2d dc 8d 74 26 00 90 3e 8d 74 26 00 55 89 e5 57 56 89 c6 53 83 ec 38 89 55 cc 65 a1 14 00 00 00 89 45 f0 31 c0 <f6> 46 2d 01 0f 84 80 00 00 00 8b 46 28 89 d3 83 e3 04 89 45 d8 89
[10048.558806] EAX: 00000000 EBX: 0000106a ECX: f282dcd8 EDX: 00000004
[10048.561472] ESI: 00000000 EDI: 0020d600 EBP: d8b49eac ESP: d8b49e68
[10048.563762] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[10048.566061] CR0: 80050033 CR2: 0000002d CR3: 14a54000 CR4: 000006b0
[10048.568586] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[10048.571184] DR6: fffe0ff0 DR7: 00000400
[10048.574269] Call Trace:
[10048.576786]  ? __switch_to_asm+0x28/0x50
[10048.579011]  ? __switch_to_asm+0x34/0x50
[10048.581314]  ? __switch_to+0x4e/0x310
[10048.583284]  ? __switch_to_asm+0x28/0x50
[10048.585234]  ? __switch_to_asm+0x34/0x50
[10048.586774]  ? abd_verify_scatter+0x28/0x30 [zfs]
[10048.588445]  zio_buf_alloc+0x28/0x60 [zfs]
[10048.590244]  abd_borrow_buf+0x37/0x40 [zfs]
[10048.592367]  vdev_file_io_strategy+0xb5/0x110 [zfs]
[10048.594023]  taskq_thread+0x295/0x4f0 [spl]
[10048.595853]  ? wake_up_q+0x70/0x70
[10048.597752]  kthread+0xf0/0x110
[10048.602267]  ? taskq_thread_spawn+0x50/0x50 [spl]
[10048.604285]  ? kthread_bind+0x30/0x30
[10048.606275]  ret_from_fork+0x2e/0x38
[10048.608304] Modules linked in: loop zfs(POE) icp(POE) zzstd(OE) zlua(OE) zcommon(POE) zunicode(POE) znvpair(POE) zavl(POE) spl(OE) binfmt_misc bochs_drm ttm ppdev drm_kms_helper evdev joydev drm qemu_fw_cfg pcspkr serio_raw sg parport_pc parport button ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic fscrypto ecb crypto_simd cryptd aes_i586 sr_mod cdrom sd_mod ata_generic ata_piix libata scsi_mod e1000 psmouse i2c_piix4 floppy
[10048.617012] CR2: 000000000000002d
[10048.619047] ---[ end trace a30776074a579ad0 ]---
[10048.621073] EIP: spl_kmem_cache_alloc+0x1e/0x710 [spl]
[10048.623064] Code: ff ff e8 a5 ad 2d dc 8d 74 26 00 90 3e 8d 74 26 00 55 89 e5 57 56 89 c6 53 83 ec 38 89 55 cc 65 a1 14 00 00 00 89 45 f0 31 c0 <f6> 46 2d 01 0f 84 80 00 00 00 8b 46 28 89 d3 83 e3 04 89 45 d8 89
[10048.627133] EAX: 00000000 EBX: 0000106a ECX: f282dcd8 EDX: 00000004
[10048.629157] ESI: 00000000 EDI: 0020d600 EBP: d8b49eac ESP: d4a5cdfc
[10048.631136] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[10048.633053] CR0: 80050033 CR2: 0000002d CR3: 14a54000 CR4: 000006b0
[10048.634991] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[10048.636848] DR6: fffe0ff0 DR7: 00000400
stale[bot] commented 2 years ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 year ago

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.