openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.42k stars 1.73k forks source link

[sparc64] Encrypted datasets erroring with "Unable to handle kernel NULL pointer dereference errors" #13922

Open koachan opened 1 year ago

koachan commented 1 year ago

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version N/A
Kernel Version 5.15.10-gentoo-sparc64
Architecture sparc64
OpenZFS Version 2.0.7

Describe the problem you're observing

I'm getting frequent "Unable to handle kernel NULL pointer dereference" errors when using ZFS, sometimes as early as just after mounting it. After that error happens, all disk I/O stalls so I practically can't do anything in the system, only hard reset seems to fix it.

Might be related to #10010 since the symptoms are the same. However, this only happens when using encrypted datasets, never when using unencrypted ones.

Describe how to reproduce the problem

(Possibly would need to do this on a sparc64 machine)

  1. Create an encrypted dataset (I used aes-256-gcm).
  2. Import it and use it, preferrably with high I/O loads.

Include any warning/errors/backtraces from the system logs

[  171.033700] Unable to handle kernel NULL pointer dereference
[  171.033713] tsk->{mm,active_mm}->context = 000000000000154c
[  171.033721] tsk->{mm,active_mm}->pgd = fff800013236e000
[  171.033726]               \|/ ____ \|/
                             "@'/ .. \`@"
                             /_| \__/ |_\
                                \__U_/
[  171.033730] z_rd_int(2722): Oops [#1]
[  171.033740] CPU: 59 PID: 2722 Comm: z_rd_int Tainted: P           O      5.15.10-gentoo-sparc64 #1
[  171.033751] TSTATE: 0000004480001600 TPC: 00000000004cc7c8 TNPC: 00000000004cc7cc Y: 00000000    Tainted: P           O     
[  171.033759] TPC: <ktime_get_raw_ts64+0x28/0xe0>
[  171.033776] g0: 0000000000000000 g1: 00000000000000a5 g2: 0000000000000000 g3: fff8000181e55040
[  171.033783] g4: fff800007dadf800 g5: fff8007ffdda2000 g6: fff800007dea0000 g7: 0000000000000000
[  171.033789] o0: 0000000000000000 o1: 000000000000000e o2: 0000000000000000 o3: 0000000000000000
[  171.033794] o4: 0000000000000000 o5: fff800007dea3968 sp: fff800007dea30d1 ret_pc: 000000000049fb6c
[  171.033800] RPC: <__wake_up_common_lock+0x4c/0x80>
[  171.033811] l0: fff800017a0c7ca0 l1: fff8007ffebb37e0 l2: 00000026837162b5 l3: 0000000000000001
[  171.033817] l4: 00000000ffffffff l5: 0000000000dcac00 l6: 0000000000b85f98 l7: 0000000000000001
[  171.033822] i0: fff800007dea3a30 i1: fff80000a5e4af30 i2: 000000001021a374 i3: 000000000004d3d8
[  171.033827] i4: 0000000000ebe080 i5: 0000000000000000 i6: fff800007dea3181 i7: 000000001062f604
[  171.033832] I7: <spa_taskq_dispatch_ent+0x44/0xa0 [zfs]>
[  171.033931] Call Trace:
[  171.033935] [<000000001062f604>] spa_taskq_dispatch_ent+0x44/0xa0 [zfs]
[  171.033977] [<000000001069c540>] zio_data_buf_free+0x460/0x15e0 [zfs]
[  171.034018] [<00000000106a4118>] zio_suspend+0x778/0x1160 [zfs]
[  171.034059] [<000000001069d74c>] zio_execute+0x8c/0x10a0 [zfs]
[  171.034100] [<000000001021ec1c>] taskq_dispatch+0x49c/0x700 [spl]
[  171.034123] [<0000000000483008>] kthread+0x108/0x140
[  171.034133] [<00000000004060c8>] ret_from_fork+0x1c/0x2c
[  171.034142] [<0000000000000000>] 0x0
[  171.034148] Caller[000000001062f604]: spa_taskq_dispatch_ent+0x44/0xa0 [zfs]
[  171.034189] Caller[000000001069c540]: zio_data_buf_free+0x460/0x15e0 [zfs]
[  171.034231] Caller[00000000106a4118]: zio_suspend+0x778/0x1160 [zfs]
[  171.034284] Caller[000000001069d74c]: zio_execute+0x8c/0x10a0 [zfs]
[  171.034337] Caller[000000001021ec1c]: taskq_dispatch+0x49c/0x700 [spl]
[  171.034348] Caller[0000000000483008]: kthread+0x108/0x140
[  171.034357] Caller[00000000004060c8]: ret_from_fork+0x1c/0x2c
[  171.034365] Caller[0000000000000000]: 0x0
[  171.034370] Instruction DUMP:
[  171.034373]  c25f20c8 
[  171.034377]  c2760000 
[  171.034381]  d05f2040 
[  171.034385] <c25a0000>
[  171.034388]  9fc04000 
[  171.034391]  01000000 
[  171.034395]  c65f2050 
[  171.034398]  f25f2048 
[  171.034401]  c4072058
ryao commented 1 year ago

Would you rebuild the ZFS kernel modules with env EXTRA_ECONFIG=--enable-debuginfo FEATURES=nostrip emerge -1 sys-fs/zfs-kmod and get a new backtrace from dmesg?

Actually, I am curious if it also happens with USE=debug, but since there is a non-zero chance that would affect the reproducibility of this, I am not including it in that command.

ryao commented 1 year ago

If it is not too much trouble, could you recompile your kernel with KASAN and UBSAN and then reproduce this with a rebuilt sys-fs/zfs-kmod?

UBSAN is under Kernel hacking -> Generic Kernel Debugging Instruments -> Undefined behaviour sanity checker.

KASAN is under Kernel hacking -> Memory Debugging -> KASAN: runtime memory debugger. For performance, you probably will want to do Instrumentation type (Inline instrumentation).

Additionally, you will probably want to set CONFIG_DEBUG_INFO=y, since that helps produce the most accurate stack traces. That is under Kernel hacking -> Compile-time checks and compiler options -> Compile the kernel with debug info.

If you use genkernel to build your kernel, you will want to pass --no-strip to keep it from stripping the kernel. After rebuilding your kernel, you should rebuild ZFS against it according to the earlier instructions.

With all of this in place, we should hopefully get more accurate data from dmesg about the problem.

koachan commented 1 year ago

Mhm, I'll try rebuilding the kernel and ZFS module with those options.

rincebrain commented 1 year ago

I would suggest treating the encryption code as unmaintained going forward, as that's what it's been the past 3 years.

ryao commented 1 year ago

I would suggest treating the encryption code as unmaintained going forward, as that's what it's been the past 3 years.

It is in the repository, so we must maintain it. At the very least, I am willing to work on bugs involving it.

rincebrain commented 1 year ago

If you're volunteering, I would suggest #11679 and #12614 as the ones that get the most duplicates.

I would warn you, though, that the list of unfixed difficulties with it is...not trivial.

Less tangentially, #13897 probably means there are going to be a number of other failure modes on sparc64 in addition to this one.

ryao commented 1 year ago

@rincebrain I have at least 50 coverity reports of possible null pointer dereferences to process. They are not easy to analyze. I also want to take a look at the buildbot issues and I have a somewhat large PR in development. Those other issues would have to wait until afterward.

rincebrain commented 1 year ago

I would not presume to interfere with your triaging, especially with as thankless a backlog as that, but I would remark that if your concern is triaging possible NULL pointer dereferences, #11679 is an unambiguously certain NULL pointer dereference.

ryao commented 1 year ago

I was referring to coverity's null pointer deference reports. That one is sadly not among them. :/

rincebrain commented 1 year ago

Ah, okay, your concern is specifically the reports, not NULL pointer dereferences with that as a means to an end.

ryao commented 1 year ago

@rincebrain I am prioritizing those and the issues in the buildbot. I need to set some kind order to make it manageable.

rincebrain commented 1 year ago

Sure, of course, otherwise your list grows faster than you consume it and ordering of tasks becomes completely random. No criticism was intended with my last remark, just a slight sadness at understanding your completely reasonable rationale for not letting me nerdsnipe you. :)

e: actually ontopic, congratulations, I think this is an encryption-related panic I haven't seen before, and I don't say that very often these days.

Even better that it's one I haven't seen on SPARC despite just reporting a bunch on SPARC a day or two ago.

I'll give it a go on my T2.

koachan commented 1 year ago

Okay, recompiled the kernel following @ryao's instructions, except for KASAN which I can't seem to enable.

KASAN is under Kernel hacking -> Memory Debugging -> KASAN: runtime memory debugger. For performance, you probably will want to do Instrumentation type (Inline instrumentation).

No option for it in the Memory Debugging menu on my system, the only options there are:

[ ] Extend memmap on extra space for more information on page
[ ] Poison pages after freeing
[ ] SLUB debugging on by default (NEW)
[ ] Enable SLUB performance statistics (NEW)

Also, unfortunately emerge decided to bump the ZFS version to 2.1.5 during the rebuild and I haven't found a way to go back to 2.0.7. So far I haven't experienced any further errors after rebuilding, I'll give an update if I get a new error.

ryao commented 1 year ago

KASAN might not have been ported to SPARC yet. I forgot to check that possibility. KASAN was actually more important than UBSAN, but if UBSAN is available, hopefully it can catch something.

ryao commented 1 year ago

https://www.kernel.org/doc/html/v5.15/dev-tools/kasan.html

Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390, and riscv architectures, and tag-based KASAN modes are supported only for arm64.

The kernel documentation confirms it. No SPARC support for KASAN yet.

Even 5.19 does not have it, although it adds PPC support:

https://www.kernel.org/doc/html/v5.19/dev-tools/kasan.html

rincebrain commented 1 year ago

SPARC is missing a lot of useful kernel debugging tools...like kdump. :/

rincebrain commented 1 year ago

Do you have any suggested IO loads you used? I tried just lobbing a bunch of files around in multiple threads while scrubbing and didn't let the magic smoke out.

koachan commented 1 year ago

It's weird. After rebuilding the kernel with the debug options and sanitizers enabled I've not been able to trigger the issue anymore...

koachan commented 1 year ago

Do you have any suggested IO loads you used?

I mostly use it as a builder for LLVM so that's the main I/O load it has, but on a bad day, I could trigger it simply by booting the machine (I'm also using an encrypted dataset as the root FS).

ryao commented 1 year ago

Is it possible that the issue was fixed in 2.1.x?

koachan commented 1 year ago

So, finally I managed to get this with version 2.1.5, sorry for the long delay.

[  747.885131] Unable to handle kernel NULL pointer dereference
[  747.885152] tsk->{mm,active_mm}->context = 0000000000000ecf
[  747.885164] tsk->{mm,active_mm}->pgd = fff8000610810000
[  747.885173]               \|/ ____ \|/
                             "@'/ .. \`@"
                             /_| \__/ |_\
                                \__U_/
[  747.885180] z_wr_int_1(2801): Oops [#1]
[  747.885197] CPU: 48 PID: 2801 Comm: z_wr_int_1 Tainted: P        W  O      5.15.10-gentoo-sparc64 #1
[  747.885258] TSTATE: 0000004480001606 TPC: 00000000004dee28 TNPC: 00000000004dee2c Y: 00000000    Tainted: P        W  O     
[  747.885273] TPC: <ktime_get_raw_ts64+0x28/0xe0>
[  747.885300] g0: 0000000000000000 g1: 00000000000002e4 g2: 0000000000000000 g3: 0000000000000000
[  747.885312] g4: fff8000057af5780 g5: fff8007ffd750000 g6: fff800004d0e8000 g7: fff800009caca5c0
[  747.885322] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000004 o3: 0000000000000000
[  747.885333] o4: 000000000000b602 o5: 00000000fa83b2da sp: fff800004d0eb251 ret_pc: 000000001070329c
[  747.885346] RPC: <zio_wait_for_children+0x5c/0xc0 [zfs]>
[  747.885716] l0: 00000000003d0900 l1: 0000000000000000 l2: 0000000000000000 l3: 0000000000000000
[  747.885730] l4: 0000000000000000 l5: 0000000000000000 l6: 0000000000000000 l7: 0000000000000000
[  747.885741] i0: fff800004d0ebbb0 i1: 0000000000000001 i2: 0000000000000001 i3: 0000000000122c4a
[  747.885752] i4: 0000000000f60240 i5: 0000000000000000 i6: fff800004d0eb301 i7: 0000000010704520
[  747.885764] I7: <zio_vdev_io_done+0xc0/0x220 [zfs]>
[  747.886039] Call Trace:
[  747.886047] [<0000000010704520>] zio_vdev_io_done+0xc0/0x220 [zfs]
[  747.886356] [<000000001070476c>] zio_execute+0x8c/0x160 [zfs]
[  747.886623] [<0000000010230a84>] taskq_thread+0x244/0x480 [spl]
[  747.886679] [<000000000048aba8>] kthread+0x108/0x140
[  747.886711] [<00000000004060c8>] ret_from_fork+0x1c/0x2c
[  747.886734] [<0000000000000000>] 0x0
[  747.886747] Caller[0000000010704520]: zio_vdev_io_done+0xc0/0x220 [zfs]
[  747.887026] Caller[000000001070476c]: zio_execute+0x8c/0x160 [zfs]
[  747.887254] Caller[0000000010230a84]: taskq_thread+0x244/0x480 [spl]
[  747.887298] Caller[000000000048aba8]: kthread+0x108/0x140
[  747.887330] Caller[00000000004060c8]: ret_from_fork+0x1c/0x2c
[  747.887395] Caller[0000000000000000]: 0x0
[  747.887406] Instruction DUMP:
[  747.887412]  c25f20c8 
[  747.887420]  c2760000 
[  747.887426]  d05f2040 
[  747.887434] <c25a0000>
[  747.887443]  9fc04000 
[  747.887471]  01000000 
[  747.887479]  c65f2050 
[  747.887487]  f25f2048 
[  747.887494]  c4072058

The stack trace is slightly different from the original 2.0.7 one, though?

koachan commented 1 year ago

Anything else I should provide for this?

ryao commented 1 year ago

I should have more time to spend on this in about a week or two, but until then, you could try seeing if this patch makes a difference:

https://github.com/openzfs/zfs/commit/e949d36040e5e79fe0dfda6a33451111cc5a0476

I do not know offhand how the memory ordering works on your SPARC processor, but if it is not a TSO memory ordering, that patch fixes an issue where the reference counting was not being done correctly.

ryao commented 1 year ago

I am not sure which address in the backtrace is the instruction pointer, but if you find it and your zfs module is built with debug info, you can in theory do:

eu-addr2line -f -e $(modinfo -n zfs) $(printf "0x%X\n" $((<IP> - $(sudo zgrep ^zfs /proc/modules | awk ' {print $6 }'))))

That should tell you the line number of the instruction pointer that caused the NULL pointer dereference, assuming that the IP is in the ZFS kernel module.

koachan commented 1 year ago

Sorry for the very late reply - I've mostly moved to an unencrypted setup and only have the time to check it now :bow:

I do not know offhand how the memory ordering works on your SPARC processor, but if it is not a TSO memory ordering, that patch fixes an issue where the reference counting was not being done correctly.

Mine's a T4, so it runs in TSO. Besides, I think Linux always sets the processor into TSO mode? Anyway, I'll try applying the patch and see if it fixes the issue.

eu-addr2line -f -e $(modinfo -n zfs) $(printf "0x%X\n" $((<IP> - $(sudo zgrep ^zfs /proc/modules | awk ' {print $6 }'))))

Plugging the address in the topmost stack trace (0x0000000010704520) to that command points me to zio.c:3906:47.