Open flippers2652 opened 2 years ago
This indicates that the in-memory checksum field is corrupt. ZFS responded by panicking to protect your data integrity.
Do you always receive the same error?
Would you be willing to recompile the ZFS kernel modules with some debug flags? In specific, ./configure --enable-debug --enable-debuginfo --enable-asan --enable-ubsan.
Honestly, I would prefer to recompile the kernel with ASAN and UBSAN here, but I realize that might be a bit too much.
UBSAN and ASAN should hopefully provide more detailed information about what is happening.
I am willing to compile it and have reached the stage I type ./configure --enable-debug --enable-debuginfo --enable-asan --enable-ubsan
however I don't know what packages are required to build on arch and also how to actually install it once built.
It looks like Arch Linux uses DKMS. There should be a file in /etc/default/zfs that has the following variables:
You want to change them to yes from no.
After doing that, you can rebuild the modules:
https://wiki.archlinux.org/title/Dynamic_Kernel_Module_Support#Rebuild_modules
I was mistaken. --enable-asan --enable-ubsan
does not apply to the kernel. Getting that in the kernel requires rebuilding the kernel.
I simplified my instructions to omit the ASAN and UBSAN parts.
I have now done that.
I have now done that.
Okay. Assuming that you have regenerated your initramfs (I forgot to mention this earlier) and rebooted, now we wait.
Where would I get any new infomation?
dmesg, although if the system locks up, that might be difficult. I suggest setting up pstore:
https://www.kernel.org/doc/html/latest/admin-guide/pstore-blk.html
I don't see anything new in dmesg.
@flippers2652 Something has to go wrong first, or did something already go wrong?
It still just only outputs the error from earlier.
Well, that tells me something, but not as much as I wanted to know.
Are you using ZFS encryption?
I am using aes-256-gcm
encryption.
a1d477c24c7badc89c60955995fd84d311938486 is what added the check that is becoming upset. I am not sure how the checksum is becoming corrupted. The only code path that zeroes this is in the encryption code, but that is only for readonly pools with an early version of encryption support. Are there any other third party kernel modules on this system?
Also, there is an issue with the backtrace from your system. It says that dsl_dataset_block_kill
called zio_free()
, but that is not possible. In fact, the stack is nonsensical. That suggests an issue in how the back traces were generated. I wonder if these modules are somehow still being stripped, or if the debuginfo is not being generated. The instructions I gave you should have fixed that. :/
When this happens, is it possible to retrieve the contains of /proc/spl/kstat/zfs/dbgmsg
via SSH?
Actually, I see another problem. 00000000ef675ad8
is a userspace pointer. This should not be possible. :/
Whenever I see something in arc_write
with wild, impossible contents, I think of #11679, which essentially (to my current understanding) involves an incorrect refcount resulting in a buffer being discoverable when the code assumes it can't be and can be manipulated freely, and then woe betides anyone who loses that race and peers inside at the wrong time.
Could be an orthogonal failure, of course, but that's where I become suspicious whenever I see that and encryption together.
a1d477c is what added the check that is becoming upset. I am not sure how the checksum is becoming corrupted. The only code path that zeroes this is in the encryption code, but that is only for readonly pools with an early version of encryption support. Are there any other third party kernel modules on this system?
I don't think so·
snd_seq_dummy 16384 0
snd_hrtimer 16384 1
snd_seq 94208 7 snd_seq_dummy
snd_seq_device 16384 1 snd_seq
rfcomm 90112 16
ccm 20480 6
cmac 16384 3
algif_hash 16384 1
algif_skcipher 16384 1
af_alg 36864 6 algif_hash,algif_skcipher
bnep 32768 2
intel_rapl_msr 20480 0
intel_rapl_common 32768 1 intel_rapl_msr
edac_mce_amd 53248 0
iwlmvm 495616 0
ccp 126976 0
rng_core 16384 1 ccp
kvm 1085440 0
snd_hda_codec_realtek 167936 1
snd_hda_codec_generic 98304 1 snd_hda_codec_realtek
irqbypass 16384 1 kvm
ledtrig_audio 16384 1 snd_hda_codec_generic
crct10dif_pclmul 16384 1
snd_hda_codec_hdmi 81920 1
crc32_pclmul 16384 0
crc32c_intel 24576 0
ghash_clmulni_intel 16384 0
snd_hda_intel 61440 2
aesni_intel 380928 8
snd_intel_dspcfg 32768 1 snd_hda_intel
wmi_bmof 16384 0
mac80211 1216512 1 iwlmvm
snd_intel_sdw_acpi 20480 1 snd_intel_dspcfg
crypto_simd 16384 1 aesni_intel
snd_hda_codec 184320 4 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec_realtek
cryptd 28672 3 crypto_simd,ghash_clmulni_intel
vfat 24576 1
fat 90112 1 vfat
amdgpu 8548352 63
libarc4 16384 1 mac80211
rapl 16384 0
snd_hda_core 118784 5 snd_hda_codec_generic,snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek
pcspkr 16384 0
snd_hwdep 16384 1 snd_hda_codec
k10temp 16384 0
snd_pcm 159744 4 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_core
iwlwifi 454656 1 iwlmvm
btusb 65536 0
snd_timer 49152 3 snd_seq,snd_hrtimer,snd_pcm
btrtl 28672 1 btusb
btbcm 20480 1 btusb
snd 131072 16 snd_hda_codec_generic,snd_seq,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_hda_codec_realtek,snd_timer,snd_pcm
btintel 45056 1 btusb
gpu_sched 49152 1 amdgpu
drm_ttm_helper 16384 1 amdgpu
soundcore 16384 1 snd
joydev 28672 0
bluetooth 745472 45 btrtl,btintel,btbcm,bnep,btusb,rfcomm
mousedev 24576 0
sp5100_tco 20480 0
ttm 86016 2 amdgpu,drm_ttm_helper
cfg80211 1056768 3 iwlmvm,iwlwifi,mac80211
i2c_piix4 36864 0
ecdh_generic 16384 2 bluetooth
igb 290816 0
crc16 16384 1 bluetooth
dca 16384 1 igb
rfkill 36864 9 bluetooth,cfg80211
wmi 45056 1 wmi_bmof
mac_hid 16384 0
pinctrl_amd 32768 0
acpi_cpufreq 32768 0
crypto_user 24576 0
fuse 172032 1
bpf_preload 16384 0
ip_tables 36864 0
x_tables 57344 1 ip_tables
usbhid 73728 0
zfs 4288512 13
zunicode 335872 1 zfs
zzstd 577536 1 zfs
zlua 188416 1 zfs
zavl 16384 1 zfs
icp 335872 1 zfs
zcommon 102400 2 zfs,icp
znvpair 110592 2 zfs,zcommon
spl 122880 6 zfs,icp,zzstd,znvpair,zcommon,zavl
xhci_pci 20480 0
xhci_pci_renesas 24576 1 xhci_pci
When this happens, is it possible to retrieve the contains of
/proc/spl/kstat/zfs/dbgmsg
via SSH?timestamp message 1663679572 spa.c:6247:spa_tryimport(): spa_tryimport: importing rpool 1663679572 spa.c:6252:spa_tryimport(): spa_tryimport: using cachefile '/etc/zfs/zpool.cache.org' 1663679572 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING 1663679572 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/nvme1n1p1': best uberblock found for spa $import. txg 8540642 1663679572 spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=8540642 1663679572 spa.c:8370:spa_async_request(): spa=$import async request task=2048 1663679572 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED 1663679572 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING 1663679572 spa.c:6247:spa_tryimport(): spa_tryimport: importing rpool 1663679572 spa.c:6252:spa_tryimport(): spa_tryimport: using cachefile '/etc/zfs/zpool.cache.org' 1663679572 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADING 1663679572 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/nvme1n1p1': best uberblock found for spa $import. txg 8540642 1663679572 spa_misc.c:418:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=8540642 1663679572 spa.c:8370:spa_async_request(): spa=$import async request task=2048 1663679572 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): LOADED 1663679572 spa_misc.c:418:spa_load_note(): spa_load($import, config trusted): UNLOADING 1663679572 spa.c:6099:spa_import(): spa_import: importing rpool 1663679572 spa_misc.c:418:spa_load_note(): spa_load(rpool, config trusted): LOADING 1663679572 vdev.c:152:vdev_dbgmsg(): disk vdev '/dev/nvme1n1p1': best uberblock found for spa rpool. txg 8540642 1663679572 spa_misc.c:418:spa_load_note(): spa_load(rpool, config untrusted): using uberblock with txg=8540642 1663679572 spa_misc.c:418:spa_load_note(): spa_load(rpool, config trusted): read 119 log space maps (119 total blocks - blksz = 131072 bytes) in 13 ms 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 0, spa rpool, vdev_id 0, ms_id 97, smp_length 114776, unflushed_allocs 167936, unflushed_frees 167936, freed 0, defer 0 + 0, unloaded time 2025 ms, loading_time 1 ms, ms_max_size 17072435200, max size error 17072304128, old_weight 840000000000001, new_weight 840000000000001 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 0, spa rpool, vdev_id 0, ms_id 66, smp_length 463200, unflushed_allocs 8974336, unflushed_frees 7659520, freed 0, defer 0 + 0, unloaded time 2029 ms, loading_time 16 ms, ms_max_size 1163558912, max size error 1162223616, old_weight 780000000000001, new_weight 780000000000001 1663679572 mmp.c:240:mmp_thread_start(): MMP thread started pool 'rpool' gethrtime 2705286387 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540643, spa rpool, vdev_id 0, ms_id 38, smp_length 317784, unflushed_allocs 106496, unflushed_frees 745472, freed 0, defer 0 + 0, unloaded time 2705 ms, loading_time 10 ms, ms_max_size 2553995264, max size error 2553864192, old_weight 7c0000000000001, new_weight 7c0000000000001 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540643, spa rpool, vdev_id 0, ms_id 115, smp_length 437272, unflushed_allocs 1769472, unflushed_frees 1208320, freed 0, defer 0 + 0, unloaded time 2716 ms, loading_time 14 ms, ms_max_size 2494697472, max size error 2494554112, old_weight 7c0000000000001, new_weight 7c0000000000001 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540643, spa rpool, vdev_id 0, ms_id 47, smp_length 388664, unflushed_allocs 368640, unflushed_frees 208896, freed 0, defer 0 + 0, unloaded time 2730 ms, loading_time 12 ms, ms_max_size 1541840896, max size error 1541804032, old_weight 780000000000002, new_weight 780000000000002 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540643, spa rpool, vdev_id 0, ms_id 54, smp_length 96472, unflushed_allocs 2150400, unflushed_frees 1695744, freed 0, defer 0 + 0, unloaded time 2743 ms, loading_time 3 ms, ms_max_size 2144243712, max size error 2143928320, old_weight 780000000000002, new_weight 780000000000002 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540643, spa rpool, vdev_id 0, ms_id 64, smp_length 189064, unflushed_allocs 2203648, unflushed_frees 2736128, freed 0, defer 0 + 0, unloaded time 2747 ms, loading_time 6 ms, ms_max_size 1402626048, max size error 1402396672, old_weight 780000000000002, new_weight 780000000000002 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540643, spa rpool, vdev_id 0, ms_id 46, smp_length 61368, unflushed_allocs 950272, unflushed_frees 651264, freed 0, defer 0 + 0, unloaded time 2753 ms, loading_time 1 ms, ms_max_size 2106228736, max size error 2106138624, old_weight 780000000000001, new_weight 780000000000001 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540643, spa rpool, vdev_id 0, ms_id 50, smp_length 176232, unflushed_allocs 106496, unflushed_frees 122880, freed 0, defer 0 + 0, unloaded time 2755 ms, loading_time 4 ms, ms_max_size 1947385856, max size error 1947365376, old_weight 780000000000001, new_weight 780000000000001 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540643, spa rpool, vdev_id 0, ms_id 112, smp_length 205840, unflushed_allocs 569344, unflushed_frees 1355776, freed 0, defer 0 + 0, unloaded time 2760 ms, loading_time 7 ms, ms_max_size 1717813248, max size error 1717731328, old_weight 780000000000001, new_weight 780000000000001 1663679572 spa.c:8370:spa_async_request(): spa=rpool async request task=1 1663679572 spa.c:8370:spa_async_request(): spa=rpool async request task=2048 1663679572 spa_misc.c:418:spa_load_note(): spa_load(rpool, config trusted): LOADED 1663679572 spa_history.c:307:spa_history_log_sync(): txg 8540646 open pool version 5000; software version zfs-2.1.5-0-g6c3c5fcfb-dist; uts archlinux 5.15.68-1-lts #1 SMP Thu, 15 Sep 2022 09:53:50 +0000 x86_64 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540646, spa rpool, vdev_id 0, ms_id 113, smp_length 305712, unflushed_allocs 577536, unflushed_frees 696320, freed 0, defer 0 + 0, unloaded time 2772 ms, loading_time 11 ms, ms_max_size 1385553920, max size error 1385422848, old_weight 780000000000001, new_weight 780000000000001 1663679572 spa.c:8370:spa_async_request(): spa=rpool async request task=32 1663679572 spa_history.c:307:spa_history_log_sync(): txg 8540651 import pool version 5000; software version zfs-2.1.5-0-g6c3c5fcfb-dist; uts archlinux 5.15.68-1-lts #1 SMP Thu, 15 Sep 2022 09:53:50 +0000 x86_64 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540651, spa rpool, vdev_id 0, ms_id 117, smp_length 109568, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 2789 ms, loading_time 5 ms, ms_max_size 2146680832, max size error 2146680832, old_weight 780000000000001, new_weight 780000000000001 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540655, spa rpool, vdev_id 0, ms_id 35, smp_length 206736, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 2789 ms, loading_time 12 ms, ms_max_size 1047310336, max size error 1047310336, old_weight 740000000000005, new_weight 740000000000005 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540655, spa rpool, vdev_id 0, ms_id 62, smp_length 253328, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 2789 ms, loading_time 13 ms, ms_max_size 979312640, max size error 979312640, old_weight 740000000000009, new_weight 740000000000009 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540655, spa rpool, vdev_id 0, ms_id 118, smp_length 362848, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 2789 ms, loading_time 15 ms, ms_max_size 2145361920, max size error 2145361920, old_weight 780000000000001, new_weight 780000000000001 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540655, spa rpool, vdev_id 0, ms_id 116, smp_length 314176, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 2789 ms, loading_time 16 ms, ms_max_size 2142957568, max size error 2142957568, old_weight 780000000000001, new_weight 780000000000001 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540655, spa rpool, vdev_id 0, ms_id 55, smp_length 249808, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 2795 ms, loading_time 12 ms, ms_max_size 1000140800, max size error 1000140800, old_weight 740000000000005, new_weight 740000000000005 1663679572 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540655, spa rpool, vdev_id 0, ms_id 114, smp_length 389632, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 2789 ms, loading_time 19 ms, ms_max_size 2017890304, max size error 2017890304, old_weight 780000000000001, new_weight 780000000000001 1663679573 metaslab.c:2436:metaslab_load_impl(): metaslab_load: txg 8540655, spa rpool, vdev_id 0, ms_id 67, smp_length 496592, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 2802 ms, loading_time 19 ms, ms_max_size 1070657536, max size error 1070657536, old_weight 740000000000003, new_weight 740000000000003 1663679578 spa_history.c:294:spa_history_log_sync(): command: zpool import -c /etc/zfs/zpool.cache.org -N rpool 1663679588 spa_history.c:330:spa_history_log_sync(): ioctl load-key 1663679588 spa_history.c:294:spa_history_log_sync(): command: zfs load-key rpool
1663681615 is when the error in journalctl happens.
1663681918 zio.c:2088:zio_deadman(): zio_wait waiting for hung I/O to pool 'rpool' 1663681979 zio.c:2088:zio_deadman(): zio_wait waiting for hung I/O to pool 'rpool'
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
is this still a problem in more recent releases / master branch?
System information
Describe the problem you're observing
I am currently running my root on ZFS. It imports fine and the computer usually operates normally however after a while I receive an error through journalctl (See Below) and the computer starts to freeze with the desktop environment usually going first.
I have scrubbed but it says there are no errors:
scan: scrub repaired 0B in 00:39:01 with 0 errors on Sun Sep 18 17:54:00 2022
Similarly,zpool status -x
returnsall pools are healthy
When I search the error it seems only to have the message coming up when the pool is imported, however, my pool imports perfectly fine. I tried asking for help on Reddit but no one knew what it was so one person suggested submitting a bug report.
Thank you for your help.
Describe how to reproduce the problem
Unknown.
Include any warning/errors/backtraces from the system logs