void-linux / void-packages

The Void source packages collection
https://voidlinux.org
Other
2.58k stars 2.14k forks source link

Segmentation fault by insert zfs module #41613

Closed ckyoog closed 4 months ago

ckyoog commented 1 year ago

Is this a new report?

Yes

System Info

Void 6.1.4_1 aarch64 Unknown uptodate F

Package(s) Affected

zfs-2.1.7_1

Does a report exist for this bug with the project's home (upstream) and/or another distro?

No response

Expected behaviour

modprobe zfs inserts the zfs module with no problem.

Actual behaviour

The following error displayed when run modprobe zfs

# modprobe zfs
[  588.001039] Unable to handle kernel paging request at virtual address 97fff71697ffe5b6
[  588.001669] Mem abort info:
[  588.001872]   ESR = 0x0000000096000004
[  588.002142]   EC = 0x25: DABT (current EL), IL = 32 bits
[  588.002528]   SET = 0, FnV = 0
[  588.002748]   EA = 0, S1PTW = 0
[  588.002979]   FSC = 0x04: level 0 translation fault
[  588.003348] Data abort info:
[  588.003554]   ISV = 0, ISS = 0x00000004
[  588.003861]   CM = 0, WnR = 0
[  588.004082] [97fff71697ffe5b6] address between user and kernel address ranges
[  588.004603] Internal error: Oops: 0000000096000004 [#1] SMP
[  588.005152] Modules linked in: spl(O+) algif_hash af_alg af_packet cfg80211 8021q garp mrp stp llc deflate efi_pstore snd_seq snd_seq_device snd_timer snd soundcore vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb tap hci_vhci bluetooth ecdh_generic rfkill ecc vfio_iommu_type1 vfio uhid dm_mod uinput userio ppp_generic slhc tun loop cuse fuse efivarfs crct10dif_ce polyval_ce polyval_generic ghash_ce sha3_ce sha3_generic sha512_ce sha512_arm64 sha2_ce sha256_arm64 sha1_ce virtio_blk virtio_net net_failover failover qemu_fw_cfg virtio_mmio btrfs blake2b_generic xor xor_neon raid6_pq libcrc32c aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
[  588.010435] CPU: 1 PID: 4369 Comm: modprobe Tainted: G           O       6.1.4_1 #1
[  588.011083] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[  588.011667] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  588.012250] pc : mod_sysfs_setup+0x1ac/0x2a0
[  588.012767] lr : mod_sysfs_setup+0x17c/0x2a0
[  588.013080] sp : ffff80000ceb3ae0
[  588.013324] x29: ffff80000ceb3ae0 x28: ffff80000ceb3c90 x27: ffff80000a322dc0
[  588.013897] x26: ffff80007cdfee70 x25: ffff80000ceb3c90 x24: ffff80007cdf8d30
[  588.014418] x23: ffff80007cdfc360 x22: ffff80007cdfeb58 x21: ffff80007cdfeb90
[  588.014936] x20: 0000000000000000 x19: ffff80007cdfeb40 x18: 0000000000000000
[  588.015453] x17: ffff00000635fdf0 x16: ffff00000635fdb0 x15: ffff00000635fd70
[  588.015969] x14: ffff00000635fd30 x13: 00646e69625f6461 x12: 657268745f716b73
[  588.016484] x11: ffff00000635ff70 x10: ffff00000635ff30 x9 : ffff800008e61dc0
[  588.017020] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : 736877645e727872
[  588.017576] x5 : ffff00000fa79d19 x4 : ffff00000fa79500 x3 : 0000000000000000
[  588.018141] x2 : ffff80007cdfeb58 x1 : ffff80007cdfeb90 x0 : 97fff71697ffe4ee
[  588.018663] Call trace:
[  588.018861]  mod_sysfs_setup+0x1ac/0x2a0
[  588.019191]  load_module+0x960/0xae0
[  588.019519]  __do_sys_finit_module+0xac/0x130
[  588.019892]  __arm64_sys_finit_module+0x28/0x34
[  588.020270]  invoke_syscall+0x78/0x100
[  588.020581]  el0_svc_common.constprop.0+0x58/0x190
[  588.020939]  do_el0_svc+0x34/0x44
[  588.021185]  el0_svc+0x34/0x140
[  588.021468]  el0t_64_sync_handler+0xf4/0x120
[  588.021835]  el0t_64_sync+0x19c/0x1a0
[  588.022079] Code: 540004a0 f9401700 aa1603e2 aa1503e1 (f9406400)
[  588.022475] ---[ end trace 0000000000000000 ]---
Segmentation fault

Steps to reproduce

  1. xbps-install zfs
  2. modprobe zfs

Additional info

$ uname -a
Linux void-live 6.1.4_1 #1 SMP PREEMPT_DYNAMIC Wed Jan 11 00:28:48 UTC 2023 aarch64 GNU/Linux

$ file /lib/modules/6.1.4_1/extra/zfs/zfs/zfs.ko
/lib/modules/6.1.4_1/extra/zfs/zfs/zfs.ko: ELF 64-bit LSB relocatable, ARM aarch64, version 1 (SYSV), BuildID[sha1]=6831b16ecce2e5dd58eed90adf7d790e6e5c5765, not stripped

It was no problem when I was using linux-lts-5.10.

( just tried on kernel 5.15, same issue is found

# uname -a
Linux void-live 5.15.85_1 #1 SMP Thu Dec 29 14:59:00 UTC 2022 aarch64 GNU/Linux

# xuname
Void 5.15.85_1 aarch64 Unknown uptodate F

# file /lib/modules/5.15.85_1/extra/zfs/zfs/zfs.ko.gz
/lib/modules/5.15.85_1-extra/zfs/zfs/zfs.ko.gz: gzip compressed data, was "zfs.ko", last modified: Fri Jan 13 09:24:45 2023, max compression, from Unix, original size modulo 2^32 4331736

# xbps-query -s linux-lts
[*] linux-lts-5.15_1         Linux LTS (Long Term Support) kernel meta package
[*] linux-lts-headers-5.15_1 Linux longterm support kernel headers meta package

)

BikyAlex commented 1 year ago

I might start looking into this, no promise though. I might just use 5.10 with zfs anyway. I was having issues with building zfs-dkms on aarch64 in the first place, until I found some tools on github that were required to build it, but it was getting segfault on modprobe on my system with 5.15, 5.16, 5.18, 5.19 and 6.0.

I don't think I tried 5.10, it's worth a short if I can get the odroid hc4 to work. If not, I might just use the armbian kernel, build it on that, and then booting void using armbian's kernel (I did this before to get void bootstrapped, but I never got zfs-dkms on armbian to work either, also because of the missing aarch64-tools script).

I might be able to strace what the module is doing when modprobe'd, but I'm not sure if I can actually fix it, I'm only a shell script kiddie.

classabbyamp commented 4 months ago

does this still occur?

BikyAlex commented 4 months ago

Let me fire up an ARM SBC and test a fresh Void install. I had to switch distros for ZFS on the HC4. I've got a pi 4 laying around, I should be back later today with what I find.

BikyAlex commented 4 months ago

RPi4 - rpi-kernel-6.6.31_1. (+headers) with zfs-2.2.4 works just fine after modprobe zfs.

[ 1242.347959] spl: loading out-of-tree module taints kernel.
[ 1242.372950] zfs: module license 'CDDL' taints kernel.
[ 1242.372964] Disabling lock debugging due to kernel taint
[ 1242.373028] zfs: module license taints kernel.
[ 1243.705772] ZFS: Loaded module v2.2.4-1, ZFS pool version 5000, ZFS filesystem version 5

I'll try testing on the odroid n2+ next, brb.

BikyAlex commented 4 months ago

Odroid N2+ - linux-kernel-6.6.33_1 (+headers) with zfs-2.2.4_2 after modprobe zfs

[ 1988.029339] module spl: .gnu.linkonce.this_module section size must match the kernel's built struct module size at run time

Reboot doesn't work at all, neither u-boot, nor petitboot can load the system. I can't see what's wrong with it, but I'm pretty sure it's getting broken after initramfs is loaded, init launched and runit core-service 02-kmods.sh is run. I don't get to a tty to troubleshoot. I'd say it's still broken.

BikyAlex commented 4 months ago

Odroid N2+ (aarch64-musl)

uname -a 
Linux n2p 6.6.33_1 #1 SMP PREEMPT_DYNAMIC Sun Jun 16 00:15:32 UTC 2024 aarch64 GNU/Linux

modprobe zfs
modprobe: ERROR: could not insert 'zfs': Exec format error

dmesg
[  753.427461] module spl: .gnu.linkonce.this_module section size must match the kernel's built struct module size at run time
[  787.174996] module spl: .gnu.linkonce.this_module section size must match the kernel's built struct module size at run time

Again, if I reboot, it fails to boot (well, it boots, but no tty or any error messages show up on the screen). If I xchroot on another aarch64 system and uninstall zfs, without modifying anything, it boots right back up.

classabbyamp commented 4 months ago

ok seems the segfault is gone