openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.44k stars 1.73k forks source link

Run CI tests with KASAN #12226

Open aerusso opened 3 years ago

aerusso commented 3 years ago

Describe the feature would like to see added to OpenZFS

Can we run the zts/ztest CI with the kernel address sanitization KASAN?

How will this feature improve OpenZFS?

We are more likely to identify kernel memory corruption.

Additional context

12216 does this for userland.

Can I do this myself by just compiling a kernel with KASAN enabled, and building ZFS as usual? Is there any documentation I should look into for this?

bghira commented 3 years ago

you can. it's straightforward, but slow like molasses in January.

bghira commented 3 years ago

https://github.com/zfsonlinux/zfs/pull/4465 this was discovered and fixed using KASAN.

behlendorf commented 3 years ago

You can absolutely do this locally. All you need to do is build a KASAN enabled kernel, then build ZFS as usual and run the test suite. The kernel documentation you linked to shows which CONFIG options need to be enabled. While you're at it I'd also suggest enabling the kernel kmemleak checker.

This is something I'd love to enable in the CI but the last time we investigated it the performance impact made it impractical. From what I've read the performance is better with the latest kernels, but I don't know if that means its fast enough to use in the CI environment.

rincebrain commented 3 years ago

Well, you could do that, but starting presumably with zstd's merging it will fail to compile unless you make dummy functions for __asan_poison_memory_region and __asan_unpoison_memory_region, because they're behind #if defined (ADDRESS_SANITIZER) in the lib/zstd.c code, which KASAN also apparently defines.

(I ran into this with 4.19.194 and ffdf019cb, just for reference.)

The exact patch I used is:

diff --git a/module/zstd/zfs_zstd.c b/module/zstd/zfs_zstd.c
index fc1b0359a..fc51a2c50 100644
--- a/module/zstd/zfs_zstd.c
+++ b/module/zstd/zfs_zstd.c
@@ -202,6 +202,11 @@ static struct zstd_fallback_mem zstd_dctx_fallback;
 static struct zstd_pool *zstd_mempool_cctx;
 static struct zstd_pool *zstd_mempool_dctx;

+void __asan_unpoison_memory_region(void const volatile *addr, size_t size);
+void __asan_poison_memory_region(void const volatile *addr, size_t size);
+void __asan_poison_memory_region(void const volatile *addr, size_t size) {};
+void __asan_unpoison_memory_region(void const volatile *addr, size_t size) {};
+

 static void
 zstd_mempool_reap(struct zstd_pool *zstd_mempool)

I'll probably eventually try getting a refined version of this merged, at a minimum with some #ifdef guards around it.

edit to add: Interactively (..over SSH), with CONFIG_KASAN_INLINE=y, it's seemed fine for me. (Though my poor low-memory 4GB VM does keep OOMing...) Maybe give it another try with that?

behlendorf commented 3 years ago

That's interesting, clearly I haven't tried this since we incorporated zstd! Thanks for posting the patch, it sounds like we'll want to incorporate some version of your change to sort the build out. It's also encouraging to hear your performance wasn't terrible. My recollection is that interactively it felt fine, but it at least doubled the total run time for the test suite.

aerusso commented 3 years ago

@rincebrain Are you saying you ran the ZTS (with ZFS version ffdf019) on a KASAN kernel, and had no memory corruption issues?

rincebrain commented 3 years ago

@rincebrain Are you saying you ran the ZTS (with ZFS version ffdf019) on a KASAN kernel, and had no memory corruption issues?

Oh, no, I would definitely not say that...

I just was looking for a specific problem when I tried building KASAN in (...yesterday), and hadn't tried running through ZTS at the time.

I have gotten through a ZTS run, though indeed, with at least one KASAN complaint in syslog. I just haven't filed it yet.

aerusso commented 3 years ago

Could you give me that info? (Either email me directly, or just open the bug.) At a minimum, I'd like to sanity-check that I am able to reproduce it.

(My ulterior motive here is that I believe that there is a memory corruption issue causing a bug I'm experiencing. That you're finding a memory corruption bug is a "good" sign that I can at least fix some bug of that type.)

rincebrain commented 3 years ago

Sure, let me just identify which test(s) were involved and reproduce it on reboot...

(I, too, started down this rabbit hole for such suspicions...)

dioni21 commented 3 years ago

@bghira:

you can. it's straightforward, but slow like molasses in January.

@behlendorf:

the last time we investigated it the performance impact made it impractical

What about a ready to run automated test environment, instead of a continuous? Every other test available in CI, but with KASAN enabled, that could be run manually or weekly.

Not sure if the same applies to ASAN (#12216), or if it could be in continuous CI. In a probable evolution, the ready to run test would enable env vars to get more details from ASAN.

rincebrain commented 3 years ago

I still think the overhead for KASAN when configured inline is probably low enough to permit CI usage, assuming A) enough runners for the rate of PR updates and B) increased runtime allowance for ZTS in it (because the overhead is, indeed, not zero, though I got sidetracked by non-KASAN tests before I measured a complete run with and without KASAN on the same commit).

Though, I don't know what the thresholds for "too much" are, here - 1.5x runtime? Doubled? Tripled? Similar numbers for RAM on the runners? (In my limited experience, IIRC, using 4GB RAM with and without KASAN ended with the OOM killer murdering every process in the former case before finishing a ZTS run, though I would have expected ARC to be smaller and life to move on...)

Though, since AFAICT none of {CentOS,Fedora,Debian,Ubuntu} ship a premade KASAN kernel package, this would require maintenance rebuilding that sometimes...though Linux makes custom kernel packages pretty simple, at least.

bghira commented 2 years ago

apparently arm64 has a few features that make kasan run better there.

rincebrain commented 2 years ago

(Gonna move discussion from #12928 to stop flooding the poor PR.)

So, I ran zfs-tests -T functional to completion on an Ubuntu 18.04 VM with a handbuilt 5.15 kernel with kASAN.

It took 05:06:03, came back with:

Tests with results other than PASS that are expected:
    FAIL casenorm/mixed_formd_delete (https://github.com/openzfs/zfs/issues/7633)
    FAIL casenorm/mixed_formd_lookup (https://github.com/openzfs/zfs/issues/7633)
    FAIL casenorm/mixed_formd_lookup_ci (https://github.com/openzfs/zfs/issues/7633)
    FAIL casenorm/mixed_none_lookup_ci (https://github.com/openzfs/zfs/issues/7633)
    FAIL casenorm/sensitive_formd_delete (https://github.com/openzfs/zfs/issues/7633)
    FAIL casenorm/sensitive_formd_lookup (https://github.com/openzfs/zfs/issues/7633)
    FAIL cli_root/zpool_import/import_rewind_device_replaced (Arbitrary pool rewind is not guaranteed)
    SKIP cli_root/zpool_import/zpool_import_missing_003_pos (https://github.com/openzfs/zfs/issues/6839)
    SKIP crtime/crtime_001_pos (Kernel statx(2) system call required on Linux)
    FAIL history/history_006_neg (https://github.com/openzfs/zfs/issues/5657)
    FAIL history/history_008_pos (Known issue)
    SKIP io/io_uring (io_uring support required)
    FAIL mmp/mmp_exported_import (Known issue)
    FAIL mmp/mmp_inactive_import (Known issue)
    FAIL no_space/enospc_002_pos (Exact free space reporting is not guaranteed)
    SKIP pam/setup (pamtester might be not available)
    FAIL refreserv/refreserv_004_pos (Known issue)
    SKIP removal/removal_with_zdb (Known issue)
    FAIL rsend/rsend_007_pos (Known issue)
    SKIP rsend/rsend_008_pos (https://github.com/openzfs/zfs/issues/6066)
    FAIL rsend/rsend_010_pos (Known issue)
    FAIL rsend/rsend_011_pos (Known issue)
    FAIL snapshot/rollback_003_pos (Known issue)
    SKIP user_namespace/setup (Kernel user namespace support required)
    FAIL vdev_zaps/vdev_zaps_007_pos (Known issue)
    FAIL zvol/zvol_misc/zvol_misc_snapdev (https://github.com/openzfs/zfs/issues/12621)
    FAIL zvol/zvol_misc/zvol_misc_volmode (Known issue)

Tests with result of PASS that are unexpected:

Tests with results other than PASS that are unexpected:
    FAIL cli_root/zfs_load-key/zfs_load-key_all (expected PASS)
    FAIL cli_root/zfs_load-key/zfs_load-key_https (expected PASS)
    FAIL cli_root/zfs_load-key/zfs_load-key_location (expected PASS)
    FAIL cli_root/zfs_load-key/zfs_load-key_recursive (expected PASS)
    FAIL cli_root/zpool_create/zpool_create_features_007_pos (expected PASS)
    FAIL cli_root/zpool_create/zpool_create_features_008_pos (expected PASS)
    SKIP cli_root/zpool_expand/zpool_expand_001_pos (expected PASS)
    SKIP cli_root/zpool_expand/zpool_expand_003_neg (expected PASS)
    SKIP cli_root/zpool_expand/zpool_expand_005_pos (expected PASS)
    FAIL cli_root/zpool_import/zpool_import_errata4 (expected PASS)
    FAIL cli_root/zpool_initialize/zpool_initialize_suspend_resume (expected PASS)
    SKIP cli_root/zpool_reopen/setup (expected PASS)
    SKIP cli_root/zpool_reopen/zpool_reopen_001_pos (expected PASS)
    SKIP cli_root/zpool_reopen/zpool_reopen_002_pos (expected PASS)
    SKIP cli_root/zpool_reopen/zpool_reopen_003_pos (expected PASS)
    SKIP cli_root/zpool_reopen/zpool_reopen_004_pos (expected PASS)
    SKIP cli_root/zpool_reopen/zpool_reopen_005_pos (expected PASS)
    SKIP cli_root/zpool_reopen/zpool_reopen_006_neg (expected PASS)
    SKIP cli_root/zpool_reopen/zpool_reopen_007_pos (expected PASS)
    SKIP cli_root/zpool_split/zpool_split_wholedisk (expected PASS)
    FAIL cli_root/zpool_status/zpool_status_features_001_pos (expected PASS)
    FAIL cli_root/zpool_upgrade/zpool_upgrade_features_001_pos (expected PASS)
    FAIL events/zed_fd_spill (expected PASS)
    SKIP fault/auto_offline_001_pos (expected PASS)
    SKIP fault/auto_online_001_pos (expected PASS)
    SKIP fault/auto_online_002_pos (expected PASS)
    SKIP fault/auto_replace_001_pos (expected PASS)
    SKIP fault/auto_spare_ashift (expected PASS)
    SKIP fault/auto_spare_shared (expected PASS)
    SKIP procfs/pool_state (expected PASS)
    FAIL redacted_send/redacted_mounts (expected PASS)

and logged three fun things in dmesg - one was #12230, the second was:

[ 4230.618699] ------------[ cut here ]------------
[ 4230.618704] Stack depot reached limit capacity
[ 4230.618723] WARNING: CPU: 1 PID: 2588 at lib/stackdepot.c:115 stack_depot_save+0x3e1/0x460
[ 4230.618732] Modules linked in: zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) virtio_net net_failover failover virtio_pci virtio_pci_modern_dev virtio virtio_ring
[ 4230.618765] CPU: 1 PID: 2588 Comm: zpool Tainted: P    B      O      5.15.12kasan1 #1
[ 4230.618768] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 4230.618771] RIP: 0010:stack_depot_save+0x3e1/0x460
[ 4230.618774] Code: 24 08 e9 98 fd ff ff 0f 0b e9 09 fe ff ff 80 3d a0 9b d9 02 00 75 15 48 c7 c7 e8 bd 9b ac c6 05 90 9b d9 02 01 e8 cf b1 85 01 <0f> 0b 48 c7 c7 6c 9a cd ad 4c 89 fe e8 0e ad 92 01 48 8b 7c 24 08
[ 4230.618777] RSP: 0018:ffff88810d2ad040 EFLAGS: 00010082
[ 4230.618781] RAX: 0000000000000000 RBX: 00000000323c01a9 RCX: 0000000000000000
[ 4230.618783] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed1021a559fa
[ 4230.618785] RBP: 000000000000002f R08: 0000000000000001 R09: ffffed10a5a8ce90
[ 4230.618787] R10: ffff88852d46747b R11: ffffed10a5a8ce8f R12: ffff88810d2ad090
[ 4230.618789] R13: 0000000000000000 R14: ffff888529e00d48 R15: 0000000000000246
[ 4230.618790] FS:  00007f82acd7c7c0(0000) GS:ffff88852d440000(0000) knlGS:0000000000000000
[ 4230.618800] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4230.618803] CR2: 0000560b78e8f7b8 CR3: 000000015a790000 CR4: 00000000000506e0
[ 4230.618805] Call Trace:
[ 4230.618808]  <TASK>
[ 4230.618810]  ? arc_hdr_destroy+0x426/0xbc0 [zfs]
[ 4230.618811]  ? spl_kmem_cache_free+0x260/0x7c0 [spl]
[ 4230.618811]  kasan_save_stack+0x32/0x40
[ 4230.618811]  ? kasan_save_stack+0x1b/0x40
[ 4230.618811]  ? kasan_set_track+0x1c/0x30
[ 4230.618811]  ? kasan_set_free_info+0x20/0x30
[ 4230.618811]  ? __kasan_slab_free+0xea/0x120
[ 4230.618811]  ? kmem_cache_free+0x74/0x270
[ 4230.618811]  ? spl_kmem_cache_free+0x260/0x7c0 [spl]
[ 4230.618811]  ? arc_hdr_destroy+0x4fe/0xbc0 [zfs]
[ 4230.618811]  ? dbuf_destroy+0xd4/0x15d0 [zfs]
[ 4230.618811]  ? dbuf_rele_and_unlock+0x5c1/0x12a0 [zfs]
[ 4230.618811]  ? zap_lookup_norm+0xe3/0x120 [zfs]
[ 4230.618811]  ? zap_lookup+0xd/0x20 [zfs]
[ 4230.618811]  ? dsl_prop_get_dd+0x236/0x4c0 [zfs]
[ 4230.618811]  ? dsl_prop_get_ds+0x371/0x530 [zfs]
[ 4230.618811]  ? dsl_prop_register+0xe2/0xcc0 [zfs]
[ 4230.618811]  ? dmu_objset_open_impl+0x778/0x23b0 [zfs]
[ 4230.618811]  ? dmu_objset_from_ds+0x272/0x620 [zfs]
[ 4230.618811]  ? dmu_objset_hold_flags+0xfb/0x130 [zfs]
[ 4230.618811]  ? dsl_prop_get+0x7c/0xf0 [zfs]
[ 4230.618811]  ? zvol_create_minors_cb+0xaa/0x3d0 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x1e4/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find+0x91/0xe0 [zfs]
[ 4230.618811]  ? zvol_create_minors_recursive+0x3dc/0x600 [zfs]
[ 4230.618811]  ? spa_import+0xbc3/0xfe0 [zfs]
[ 4230.618811]  ? zfs_ioc_pool_import+0x30e/0x3b0 [zfs]
[ 4230.618811]  ? zfsdev_ioctl_common+0xa71/0x1710 [zfs]
[ 4230.618811]  ? zfsdev_ioctl+0x4a/0xd0 [zfs]
[ 4230.618811]  ? __x64_sys_ioctl+0x122/0x190
[ 4230.618811]  ? do_syscall_64+0x3b/0x90
[ 4230.618811]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  kasan_set_track+0x1c/0x30
[ 4230.618811]  kasan_set_free_info+0x20/0x30
[ 4230.618811]  __kasan_slab_free+0xea/0x120
[ 4230.618811]  ? spl_kmem_cache_free+0x260/0x7c0 [spl]
[ 4230.618811]  kmem_cache_free+0x74/0x270
[ 4230.618811]  ? arc_write+0x1930/0x1930 [zfs]
[ 4230.618811]  spl_kmem_cache_free+0x260/0x7c0 [spl]
[ 4230.618811]  arc_hdr_destroy+0x4fe/0xbc0 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  dbuf_destroy+0xd4/0x15d0 [zfs]
[ 4230.618811]  dbuf_rele_and_unlock+0x5c1/0x12a0 [zfs]
[ 4230.618811]  ? kasan_unpoison+0x23/0x50
[ 4230.618811]  ? zap_match+0x1b0/0x1b0 [zfs]
[ 4230.618811]  ? dbuf_create_bonus+0x160/0x160 [zfs]
[ 4230.618811]  ? __kasan_kmalloc+0x7c/0x90
[ 4230.618811]  ? mutex_lock+0x89/0xd0
[ 4230.618811]  ? __mutex_lock_slowpath+0x10/0x10
[ 4230.618811]  ? kfree+0x8b/0x220
[ 4230.618811]  zap_lookup_norm+0xe3/0x120 [zfs]
[ 4230.618811]  ? zap_count+0x1a0/0x1a0 [zfs]
[ 4230.618811]  ? zprop_name_to_prop+0x82/0xd0 [zcommon]
[ 4230.618811]  zap_lookup+0xd/0x20 [zfs]
[ 4230.618811]  dsl_prop_get_dd+0x236/0x4c0 [zfs]
[ 4230.618811]  dsl_prop_get_ds+0x371/0x530 [zfs]
[ 4230.618811]  ? rrw_held+0xcc/0x1c0 [zfs]
[ 4230.618811]  dsl_prop_register+0xe2/0xcc0 [zfs]
[ 4230.618811]  ? secondary_cache_changed_cb+0x80/0x80 [zfs]
[ 4230.618811]  ? kasan_unpoison+0x23/0x50
[ 4230.618811]  ? dsl_prop_get_int_ds+0x20/0x20 [zfs]
[ 4230.618811]  ? spa_feature_decr+0x10/0x10 [zfs]
[ 4230.618811]  dmu_objset_open_impl+0x778/0x23b0 [zfs]
[ 4230.618811]  ? dmu_objset_sync_done+0x4f0/0x4f0 [zfs]
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? rrw_enter_read_impl+0x290/0x460 [zfs]
[ 4230.618811]  dmu_objset_from_ds+0x272/0x620 [zfs]
[ 4230.618811]  ? dsl_pool_hold+0xcb/0xf0 [zfs]
[ 4230.618811]  ? dmu_objset_open_impl+0x23b0/0x23b0 [zfs]
[ 4230.618811]  ? dsl_pool_user_release+0x10/0x10 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  dmu_objset_hold_flags+0xfb/0x130 [zfs]
[ 4230.618811]  ? dmu_objset_from_ds+0x620/0x620 [zfs]
[ 4230.618811]  ? zvol_create_minors_recursive+0x3dc/0x600 [zfs]
[ 4230.618811]  ? zfs_ioc_pool_import+0x30e/0x3b0 [zfs]
[ 4230.618811]  ? zfsdev_ioctl_common+0xa71/0x1710 [zfs]
[ 4230.618811]  ? zfsdev_ioctl+0x4a/0xd0 [zfs]
[ 4230.618811]  ? __x64_sys_ioctl+0x122/0x190
[ 4230.618811]  ? do_syscall_64+0x3b/0x90
[ 4230.618811]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4230.618811]  ? kfree+0x8b/0x220
[ 4230.618811]  ? tsd_hash_dtor+0x14a/0x220 [spl]
[ 4230.618811]  dsl_prop_get+0x7c/0xf0 [zfs]
[ 4230.618811]  ? dsl_prop_register+0xcc0/0xcc0 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? dbuf_create_bonus+0x160/0x160 [zfs]
[ 4230.618811]  zvol_create_minors_cb+0xaa/0x3d0 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x1e4/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811]  ? mutex_unlock+0x7b/0xd0
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811]  ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811]  dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811]  ? zfs_refcount_add_many+0x4d/0x350 [zfs]
[ 4230.618811]  ? spa_open_common+0x5f5/0xa60 [zfs]
[ 4230.618811]  ? spa_load_best+0x850/0x850 [zfs]
[ 4230.618811]  ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811]  dmu_objset_find+0x91/0xe0 [zfs]
[ 4230.618811]  ? wake_up_q+0xa0/0x110
[ 4230.618811]  ? dmu_objset_find_dp_cb+0x60/0x60 [zfs]
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x1b0/0x2f0
[ 4230.618811]  zvol_create_minors_recursive+0x3dc/0x600 [zfs]
[ 4230.618811]  ? zvol_last_close+0x190/0x190 [zfs]
[ 4230.618811]  ? kasan_unpoison+0x23/0x50
[ 4230.618811]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811]  spa_import+0xbc3/0xfe0 [zfs]
[ 4230.618811]  ? nvlist_common.part.106+0x149/0x570 [znvpair]
[ 4230.618811]  ? spa_create+0x1b30/0x1b30 [zfs]
[ 4230.618811]  ? nvlist_exists+0xd0/0xd0 [znvpair]
[ 4230.618811]  ? free_unref_page_commit.isra.0+0x233/0x540
[ 4230.618811]  ? drain_pages+0x80/0x80
[ 4230.618811]  ? free_pcp_prepare+0x8a/0x450
[ 4230.618811]  ? free_unref_page+0xa2/0xe0
[ 4230.618811]  ? get_nvlist+0xd8/0x1b0 [zfs]
[ 4230.618811]  ? memmove+0x39/0x60
[ 4230.618811]  zfs_ioc_pool_import+0x30e/0x3b0 [zfs]
[ 4230.618811]  ? zfs_ioc_clear+0x690/0x690 [zfs]
[ 4230.618811]  ? kasan_unpoison+0x23/0x50
[ 4230.618811]  ? __kasan_slab_alloc+0x2c/0x80
[ 4230.618811]  ? memcpy+0x39/0x60
[ 4230.618811]  ? strlcpy+0x8f/0x110
[ 4230.618811]  zfsdev_ioctl_common+0xa71/0x1710 [zfs]
[ 4230.618811]  ? __alloc_pages_slowpath.constprop.0+0x1e40/0x1e40
[ 4230.618811]  ? mmu_notifier_range_update_to_read_only+0x4a/0xa0
[ 4230.618811]  ? zfsdev_state_destroy+0x1b0/0x1b0 [zfs]
[ 4230.618811]  ? __kasan_kmalloc_large+0x81/0xa0
[ 4230.618811]  ? __kmalloc_node+0x206/0x2b0
[ 4230.618811]  ? kvmalloc_node+0x4d/0x90
[ 4230.618811]  zfsdev_ioctl+0x4a/0xd0 [zfs]
[ 4230.618811]  __x64_sys_ioctl+0x122/0x190
[ 4230.618811]  do_syscall_64+0x3b/0x90
[ 4230.618811]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4230.618811] RIP: 0033:0x7f82ab3ac317
[ 4230.618811] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[ 4230.618811] RSP: 002b:00007ffcc9c34c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 4230.618811] RAX: ffffffffffffffda RBX: 00007ffcc9c34d00 RCX: 00007f82ab3ac317
[ 4230.618811] RDX: 00007ffcc9c34d00 RSI: 0000000000005a02 RDI: 0000000000000003
[ 4230.618811] RBP: 00007ffcc9c38bf0 R08: 00005573f7ea5130 R09: 0000000000000000
[ 4230.618811] R10: 00005573f7e7d010 R11: 0000000000000246 R12: 00005573f7e7d2e0
[ 4230.618811] R13: 00005573f7e8e548 R14: 0000000000000000 R15: 0000000000000000
[ 4230.618811]  </TASK>
[ 4230.618811] ---[ end trace 25880a7254006869 ]---

(whew, that was long, and I might have repeated a line or two that occurred 5+ times in a row)

And the final one:

[12458.481675] ------------[ cut here ]------------
[12458.481679] WARNING: CPU: 0 PID: 27863 at fs/read_write.c:525 __kernel_write+0x765/0x9e0
[12458.481688] Modules linked in: zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) virtio_net net_failover failover virtio_pci virtio_pci_modern_dev virtio virtio_ring
[12458.481718] CPU: 0 PID: 27863 Comm: python3.6 Tainted: P    B   W  O      5.15.12kasan1 #1
[12458.481721] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[12458.481724] RIP: 0010:__kernel_write+0x765/0x9e0
[12458.481728] Code: fe ff ff 48 c7 c6 60 4e 4b ac 48 c7 c7 40 80 e5 ac e8 2f 07 7c 00 85 c0 0f 85 4b 35 01 02 49 c7 c6 ea ff ff ff e9 ee fe ff ff <0f> 0b 49 c7 c6 f7 ff ff ff e9 e0 fe ff ff 48 b8 00 00 00 00 00 fc
[12458.481730] RSP: 0018:ffff88835b2ef000 EFLAGS: 00010246
[12458.481735] RAX: 00000000480a801d RBX: ffff88810ee13000 RCX: dffffc0000000000
[12458.481737] RDX: 0000000000000000 RSI: ffff88838eb6d800 RDI: ffff88823e666cc4
[12458.481739] RBP: 1ffff1106b65de03 R08: 0000000000000138 R09: ffffffffad986048
[12458.481741] R10: dffffc0000000000 R11: ffffed106b65ddc7 R12: ffff88823e666c80
[12458.481743] R13: ffff88835b2ef1c0 R14: ffff88835b2ef1c0 R15: 0000000000000138
[12458.481746] FS:  00007f91e8b5a740(0000) GS:ffff88852d400000(0000) knlGS:0000000000000000
[12458.481750] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12458.481752] CR2: 00000000017fe908 CR3: 000000014aba6000 CR4: 00000000000506f0
[12458.481754] Call Trace:
[12458.481756]  <TASK>
[12458.481758]  ? kasan_save_stack+0x32/0x40
[12458.481763]  ? do_iter_readv_writev+0x6f0/0x6f0
[12458.481766]  ? __kasan_slab_free+0xea/0x120
[12458.481769]  ? dmu_send+0x618/0xbb0 [zfs]
[12458.481906]  ? zfs_ioc_send_new+0x22c/0x2c0 [zfs]
[12458.481965]  ? zfsdev_ioctl_common+0xebe/0x1710 [zfs]
[12458.481996]  ? zfsdev_ioctl+0x4a/0xd0 [zfs]
[12458.482027]  ? __x64_sys_ioctl+0x122/0x190
[12458.482032]  ? do_syscall_64+0x3b/0x90
[12458.482036]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[12458.482040]  ? __cond_resched+0x10/0x20
[12458.482043]  ? __inode_security_revalidate+0x98/0xc0
[12458.482048]  ? selinux_file_permission+0x32d/0x410
[12458.482052]  ? security_file_permission+0x4e/0x580
[12458.482056]  kernel_write+0x9f/0x2f0
[12458.482061]  zfs_file_write+0x94/0x170 [zfs]
[12458.482092]  ? zfs_file_close+0x10/0x10 [zfs]
[12458.482119]  dump_record+0x1ff/0x8f0 [zfs]
[12458.482152]  dmu_send_impl+0x12bd/0x3ca0 [zfs]
[12458.482183]  ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[12458.482220]  ? do_dump+0x28e0/0x28e0 [zfs]
[12458.482251]  ? dbuf_rele_and_unlock+0x6c9/0x12a0 [zfs]
[12458.482282]  ? dbuf_create_bonus+0x160/0x160 [zfs]
[12458.482312]  ? __mutex_lock_slowpath+0x10/0x10
[12458.482315]  ? zfs_refcount_count+0x16/0x40 [zfs]
[12458.482348]  ? dsl_dataset_hold_flags+0x2e5/0x630 [zfs]
[12458.482382]  ? dsl_dataset_hold_obj_flags+0x120/0x120 [zfs]
[12458.482420]  ? mutex_unlock+0x7b/0xd0
[12458.482424]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[12458.482427]  ? __kasan_kmalloc+0x7c/0x90
[12458.482430]  ? zfs_refcount_add_many+0x4d/0x350 [zfs]
[12458.482463]  ? create_prof_cpu_mask+0x20/0x20
[12458.482467]  ? arch_stack_walk+0x99/0xf0
[12458.482471]  dmu_send+0x618/0xbb0 [zfs]
[12458.482503]  ? dmu_send_obj+0x570/0x570 [zfs]
[12458.482533]  ? stack_trace_consume_entry+0x160/0x160
[12458.482537]  ? unwind_next_frame+0x11a1/0x17e0
[12458.482543]  ? stack_trace_consume_entry+0x160/0x160
[12458.482546]  ? stack_trace_save+0x8c/0xc0
[12458.482549]  ? kasan_save_stack+0x32/0x40
[12458.482552]  ? kasan_save_stack+0x1b/0x40
[12458.482556]  ? __kasan_kmalloc+0x7c/0x90
[12458.482559]  ? spl_kmem_alloc_impl+0x11f/0x160 [spl]
[12458.482564]  ? nv_mem_zalloc.isra.12+0x4e/0x80 [znvpair]
[12458.482570]  ? nvlist_xalloc.part.13+0xd8/0x340 [znvpair]
[12458.482574]  ? fnvlist_alloc+0x61/0xc0 [znvpair]
[12458.482579]  ? zfsdev_ioctl_common+0xddd/0x1710 [zfs]
[12458.482613]  ? zfsdev_ioctl+0x4a/0xd0 [zfs]
[12458.482644]  ? nvt_lookup_name_type.isra.54+0x15b/0x420 [znvpair]
[12458.482649]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[12458.482652]  ? memmove+0x39/0x60
[12458.482655]  ? nvpair_value_common.part.20+0x235/0x3b0 [znvpair]
[12458.482660]  zfs_ioc_send_new+0x22c/0x2c0 [zfs]
[12458.482692]  ? zfs_ioc_send_space+0x770/0x770 [zfs]
[12458.482722]  ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[12458.482726]  ? kasan_unpoison+0x23/0x50
[12458.482729]  ? __kasan_slab_alloc+0x2c/0x80
[12458.482732]  ? __kasan_kmalloc+0x7c/0x90
[12458.482735]  ? memset+0x20/0x40
[12458.482737]  ? nv_mem_zalloc.isra.12+0x63/0x80 [znvpair]
[12458.482741]  ? nvlist_xalloc.part.13+0xd8/0x340 [znvpair]
[12458.482746]  ? zfs_ioc_send+0x6a0/0x6a0 [zfs]
[12458.482776]  ? nvlist_lookup_nvpair_embedded_index+0x20/0x20 [znvpair]
[12458.482781]  ? memcpy+0x39/0x60
[12458.482784]  zfsdev_ioctl_common+0xebe/0x1710 [zfs]
[12458.482882]  ? zfsdev_state_destroy+0x1b0/0x1b0 [zfs]
[12458.482913]  ? __kasan_kmalloc_large+0x81/0xa0
[12458.482917]  ? __kmalloc_node+0x206/0x2b0
[12458.482921]  ? kvmalloc_node+0x4d/0x90
[12458.482925]  zfsdev_ioctl+0x4a/0xd0 [zfs]
[12458.482956]  __x64_sys_ioctl+0x122/0x190
[12458.482959]  do_syscall_64+0x3b/0x90
[12458.482963]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[12458.482967] RIP: 0033:0x7f91e8668317
[12458.482971] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[12458.482974] RSP: 002b:00007ffeecd13348 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[12458.482979] RAX: ffffffffffffffda RBX: 0000000000005a40 RCX: 00007f91e8668317
[12458.482982] RDX: 00007ffeecd13370 RSI: 0000000000005a40 RDI: 0000000000000004
[12458.482985] RBP: 00007ffeecd16960 R08: 0000000000000020 R09: 00000000017f8590
[12458.482987] R10: 0000000500000001 R11: 0000000000000246 R12: 00007ffeecd13370
[12458.482989] R13: 0000000000000000 R14: 0000000000005a40 R15: 00000000017f8590
[12458.482992]  </TASK>
[12458.482995] ---[ end trace 25880a725400686a ]---

I can go find out which tests the latter two happened during if they're hard to repro for anyone.

Some of the tests failed because I forgot to build scsi-debug into the kernel config. Whoops.

nabijaczleweli commented 2 years ago

It seems that I spoke too soon in https://github.com/openzfs/zfs/pull/12928#issuecomment-1007496550, because it got to Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table] and panicked because it smashed its stack(?):

[ 1323.717046] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1323.719230] CPU: 2 PID: 94177 Comm: txg_sync Tainted: P    B      OE     5.15.0-2-amd64 #1  Debian 5.11
[ 1323.721843] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1323.724027] Call Trace:
[ 1323.724703]  <TASK>
[ 1323.725308]  dump_stack_lvl+0x46/0x5a
[ 1323.726325]  panic+0x18b/0x389
[ 1323.727146]  ? __warn_printk+0xf3/0xf3
[ 1323.728141]  ? kasan_save_stack+0x32/0x40
[ 1323.729247]  ? kasan_save_stack+0x1b/0x40
[ 1323.730344]  ? __schedule+0xca/0xf90
[ 1323.731312]  ? schedule+0x30/0x120
[ 1323.732275]  __schedule+0xf8b/0xf90
[ 1323.734046]  ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0
[ 1323.735838]  ? io_schedule_timeout+0xb0/0xb0
[ 1323.737164]  ? llist_add_batch+0x33/0x50
[ 1323.738928]  schedule+0x6d/0x120
[ 1323.739864]  schedule_timeout+0xe4/0x1f0
[ 1323.740958]  ? usleep_range+0xe0/0xe0
[ 1323.742761]  ? try_to_wake_up+0x392/0x910
[ 1323.743880]  ? __bpf_trace_tick_stop+0xe0/0xe0
[ 1323.745165]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 1323.747169]  ? __native_queued_spin_unlock+0x9/0x10
[ 1323.748482]  ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[ 1323.750805]  __cv_timedwait_common+0x19e/0x2b0 [spl]
[ 1323.752477]  ? __cv_wait_idle+0xd0/0xd0 [spl]
[ 1323.754185]  ? recalc_sigpending+0x5a/0x70
[ 1323.755540]  ? finish_wait+0x100/0x100
[ 1323.756554]  ? mutex_unlock+0x80/0xd0
[ 1323.757855]  ? bpobj_space+0x10c/0x120 [zfs]
[ 1323.761056]  __cv_timedwait_idle+0x9a/0xe0 [spl]
[ 1323.762792]  ? __cv_timedwait_sig+0x70/0x70 [spl]
[ 1323.764102]  ? __bitmap_weight+0x71/0x90
Test: /usr/local/share/zfs/zfs-tests/[te st1s/3fu2nc3t.765322]  txg_sync_thread+0x24f/0x760 [zfs]
[ 1323.768519]  ? kasan_set_track+0x1c/0x30
ional/channel_program/lua_core/tst.stack_gs[ub  (1r323.770070]  ? txg_fini+0x300/0x300 [zfs]
un a[s  ro1ot3) 2[030:.772767]  thread_generic_wrapper+0xa8/0xc0 [spl]
30] [[PA SS1]
23.774855]  ? __thread_exit+0x20/0x20 [spl]
[ 1323.776636]  kthread+0x1d2/0x200
[ 1323.777992]  ? set_kthread_struct+0x80/0x80
[ 1323.779343]  ret_from_fork+0x22/0x30
[ 1323.780339]  </TASK>
[ 1323.781259] Kernel Offset: 0x33200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x)
[ 1323.784663] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---

(The output mingling is as original from the console.) This seems to point to lua, which is as-expected (#12230), but reading through that it doesn't look like the kernel out-right panicked in that run?

Here's the results (though, well, it panicked, so): zts-results.ecYPqF.gz

rincebrain commented 2 years ago

I did have it panic once and say the stack was destroyed, though I didn't get a trace from why, when I gave it much less RAM than I thought I had; increasing it made it just complain.

nabijaczleweli commented 2 years ago

That's with -m 48g (half host memory) and -device virtio-balloon (the efficacy of which I don't know how to ascertain; QEMU has 12.3G RES and started up near-instantly, so I think it's working? but dunno for sure), which, well, should be enough, right?

rincebrain commented 2 years ago

Yeah, no kidding - I was using n=4 and 24 GB.

nabijaczleweli commented 2 years ago

Happened again (I filtered by -T functional like you said you had in hopes of avoiding this, but no luck):

Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table]
[ 1412.446099] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1412.449285] CPU: 3 PID: 92945 Comm: txg_sync Tainted: P    B      OE     5.15.0-2-amd64 #1  Debian 5.11
[ 1412.453096] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1412.456292] Call Trace:
[ 1412.457286]  <TASK>
[ 1412.458131]  dump_stack_lvl+0x46/0x5a
[ 1412.459583]  panic+0x18b/0x389
[ 1412.460794]  ? __warn_printk+0xf3/0xf3
[ 1412.462401]  ? __schedule+0xca/0xf90
[ 1412.463906]  ? schedule+0x30/0x120
[ 1412.465427]  __schedule+0xf8b/0xf90
[ 1412.466923]  ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0
[ 1412.469600]  ? io_schedule_timeout+0xb0/0xb0
[ 1412.471564]  ? x2apic_send_IPI+0x60/0x70
[ 1412.473266]  schedule+0x6d/0x120
[ 1412.474715]  schedule_timeout+0xe4/0x1f0
[ 1412.476451]  ? usleep_range+0xe0/0xe0
[ 1412.478161]  ? try_to_wake_up+0x392/0x910
[ 1412.479850]  ? __bpf_trace_tick_stop+0xe0/0xe0
[ 1412.481714]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 1412.484069]  ? __native_queued_spin_unlock+0x9/0x10
[ 1412.486105]  ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[ 1412.489007]  __cv_timedwait_common+0x19e/0x2b0 [spl]
[ 1412.491329]  ? __cv_wait_idle+0xd0/0xd0 [spl]
[ 1412.493425]  ? recalc_sigpending+0x5a/0x70
[ 1412.495169]  ? finish_wait+0x100/0x100
[ 1412.496692]  ? mutex_unlock+0x80/0xd0
[ 1412.498196]  ? bpobj_space+0x10c/0x120 [zfs]
[ 1412.501370]  __cv_timedwait_idle+0x9a/0xe0 [spl]
Test[:  /u1sr4/l1oc2al.503351]  ? __cv_timedwait_sig+0x70/0x70 [spl]
[ 1412.505670]  ? __bitmap_weight+0x71/0x90
/share/zfs/zfs-tests/tests/function[a l/141c2ha.507n2ne7l_6pr]  txg_sync_thread+0x24f/0x760 [zfs]
og[ra m/1lu4a_1co2re./t510602]  ? kasan_set_track+0x1c/0x30
st.stack_gsub (run as root) [00:00] [PASS]
[ 1412.512569]  ? txg_fini+0x300/0x300 [zfs]
[ 1412.515557]  thread_generic_wrapper+0xa8/0xc0 [spl]
[ 1412.517615]  ? __thread_exit+0x20/0x20 [spl]
[ 1412.519540]  kthread+0x1d2/0x200
[ 1412.521014]  ? set_kthread_struct+0x80/0x80
[ 1412.522881]  ret_from_fork+0x22/0x30
[ 1412.524383]  </TASK>
[ 1412.525803] Kernel Offset: 0x6800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xf)
[ 1412.530304] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---

Results: zts-results.8XEhF3.gz (again, from the guest, which panicked; I guess I could network mount this, but that sounds like an amazing way to triple the run-time)

This is my qemu cmdline (line-broken for your viewing pleasure; the smp configuration mimicks the host, except the host is (a) NUMA, obviously, and (b) has twice as many cores/socket):

qemu-system-x86_64 -enable-kvm -smp sockets=2,cores=3,threads=2 -m 48g -nographic -vga none \
  -nic user,model=virtio,hostfwd=tcp::2222-:22 \
  -drive file=/dev/zvol/filling/store/nabijaczleweli/vm-kasan-test-root,if=none,id=root,format=raw,cache=none -device virtio-blk-pci,drive=root \
  -drive file=/dev/zvol/filling/store/nabijaczleweli/vm-kasan-test-scratch,if=none,id=scratch,format=raw,cache=none -device virtio-blk-pci,drive=scratch \
  -device virtio-balloon \
  -kernel kasan-test/vmlinuz-5.15.0-2-amd64 -initrd kasan-test/initrd.img-5.15.0-2-amd64 -append 'console=ttyS0 root=/dev/vda'

A pickle indeed. Maybe unballooning will help? (I doubt it from the trace, but it'd be fun. Otherwise I have no clue, since, well.)

rincebrain commented 2 years ago

Novel.

Last time I used the ballooning driver, it was with Xen 3, so I have no constructive input there.

Here is the .config I used with my kASAN kernel, if you'd like to compare it to yours.

nabijaczleweli commented 2 years ago

Disabling the balloon seems to have no effect:

Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table]
[ 1255.851278] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1255.853384] CPU: 2 PID: 95047 Comm: txg_sync Tainted: P    B      OE     5.15.0-2-amd64 #1  Debian 5.11
[ 1255.855991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1255.858192] Call Trace:
[ 1255.858865]  <TASK>
[ 1255.859455]  dump_stack_lvl+0x46/0x5a
[ 1255.860457]  panic+0x18b/0x389
[ 1255.861282]  ? __warn_printk+0xf3/0xf3
[ 1255.862837]  ? __schedule+0xca/0xf90
[ 1255.864394]  ? schedule+0x30/0x120
[ 1255.865771]  __schedule+0xf8b/0xf90
[ 1255.867219]  ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0
[ 1255.869614]  ? io_schedule_timeout+0xb0/0xb0
[ 1255.871356]  ? x2apic_send_IPI+0x60/0x70
[ 1255.873033]  schedule+0x6d/0x120
[ 1255.874413]  schedule_timeout+0xe4/0x1f0
[ 1255.876030]  ? usleep_range+0xe0/0xe0
[ 1255.877508]  ? try_to_wake_up+0x392/0x910
[ 1255.879227]  ? __bpf_trace_tick_stop+0xe0/0xe0
[ 1255.881019]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 1255.883442]  ? __native_queued_spin_unlock+0x9/0x10
[ 1255.885475]  ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[ 1255.888179]  __cv_timedwait_common+0x19e/0x2b0 [spl]
[ 1255.890329]  ? __cv_wait_idle+0xd0/0xd0 [spl]
[ 1255.892265]  ? recalc_sigpending+0x5a/0x70
[ 1255.893919]  ? finish_wait+0x100/0x100
[ 1255.895497]  ? mutex_unlock+0x80/0xd0
[ 1255.896864]  ? bpobj_space+0x10c/0x120 [zfs]
[ 1255.900311]  __cv_timedwait_idle+0x9a/0xe0 [spl]
[ 1255.902165]  ? __cv_timedwait_sig+0x70/0x70 [spl]
[ 1255.903998]  ? __bitmap_weight+0x71/0x90
[ 1255.905528]  txg_sync_thread+0x24f/0x760 [zfs]
[ 1255.908229]  ? kasan_set_track+0x1c/0x30
[ 1255.910077]  ? txg_fini+0x300/0x300 [zfs]
[ 1255.913039]  thread_generic_wrapper+0xa8/0xc0 [spl]
[ 1255.914773]  ? __thread_exit+0x20/0x20 [spl]
[ 1255.916410]  kthread+0x1d2/0x200
[ 1255.917541]  ? set_kthread_struct+0x80/0x80
[ 1255.919008]  ret_from_fork+0x22/0x30
[ 1255.920208]  </TASK>
[ 1255.921244] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x)
[ 1255.924842] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---

(It also hasn't changed QEMU's memory usage, so.); zts-results.lD6EIq.gz

In what is an ultimate basic bitch move, I just built the debian kernel packages but added CONFIG_KASAN=y where the original had "CONFIG_KASAN is unset", and installed them on a fresh sid strap: config-5.15.0-2-amd64.gz; I can upload the send of the image later, if there's interest.

Rudimentary analysis (git diff) reveals that they're almost entirely unrelated; grepping for KASAN shows this (-your kasan, +my debian):

 CONFIG_KASAN=y
 CONFIG_KASAN_GENERIC=y
-# CONFIG_KASAN_OUTLINE is not set
-CONFIG_KASAN_INLINE=y
+CONFIG_KASAN_OUTLINE=y
+# CONFIG_KASAN_INLINE is not set
 CONFIG_KASAN_STACK=y
 # CONFIG_KASAN_VMALLOC is not set
 # CONFIG_KASAN_MODULE_TEST is not set

I assume OUTLINE is the default, since I changed no other lines in the seed config. (It also seems prudent to note that I know jack squat about how these things would interact => not a clue what this realistically means.)

rincebrain commented 2 years ago

Yeah, mine was edited make defconfig, so unsurprising it didn't have much in common.

INLINE means, AIUI, what it says on the tin for KASAN - is it making actual calls for the kasan shims around everything, or is it inlining them and laughing at the bloat that ensues?

I could imagine actual calls everywhere would make a significant difference...

On Fri, Jan 7, 2022 at 12:05 PM наб @.***> wrote:

Disabling the balloon seems to have no effect:

Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table] [ 1255.851278] Kernel panic - not syncing: corrupted stack end detected inside scheduler [ 1255.853384] CPU: 2 PID: 95047 Comm: txg_sync Tainted: P B OE 5.15.0-2-amd64 #1 Debian 5.11 [ 1255.855991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 1255.858192] Call Trace: [ 1255.858865] [ 1255.859455] dump_stack_lvl+0x46/0x5a [ 1255.860457] panic+0x18b/0x389 [ 1255.861282] ? warn_printk+0xf3/0xf3 [ 1255.862837] ? schedule+0xca/0xf90 [ 1255.864394] ? schedule+0x30/0x120 [ 1255.865771] schedule+0xf8b/0xf90 [ 1255.867219] ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0 [ 1255.869614] ? io_schedule_timeout+0xb0/0xb0 [ 1255.871356] ? x2apic_send_IPI+0x60/0x70 [ 1255.873033] schedule+0x6d/0x120 [ 1255.874413] schedule_timeout+0xe4/0x1f0 [ 1255.876030] ? usleep_range+0xe0/0xe0 [ 1255.877508] ? try_to_wake_up+0x392/0x910 [ 1255.879227] ? __bpf_trace_tick_stop+0xe0/0xe0 [ 1255.881019] ? mutex_unlock_slowpath.constprop.0+0x210/0x210 [ 1255.883442] ? native_queued_spin_unlock+0x9/0x10 [ 1255.885475] ? __raw_calleesavenative_queued_spin_unlock+0x11/0x1e [ 1255.888179] cv_timedwait_common+0x19e/0x2b0 [spl] [ 1255.890329] ? cv_wait_idle+0xd0/0xd0 [spl] [ 1255.892265] ? recalc_sigpending+0x5a/0x70 [ 1255.893919] ? finish_wait+0x100/0x100 [ 1255.895497] ? mutex_unlock+0x80/0xd0 [ 1255.896864] ? bpobj_space+0x10c/0x120 [zfs] [ 1255.900311] cv_timedwait_idle+0x9a/0xe0 [spl] [ 1255.902165] ? cv_timedwait_sig+0x70/0x70 [spl] [ 1255.903998] ? bitmap_weight+0x71/0x90 [ 1255.905528] txg_sync_thread+0x24f/0x760 [zfs] [ 1255.908229] ? kasan_set_track+0x1c/0x30 [ 1255.910077] ? txg_fini+0x300/0x300 [zfs] [ 1255.913039] thread_generic_wrapper+0xa8/0xc0 [spl] [ 1255.914773] ? thread_exit+0x20/0x20 [spl] [ 1255.916410] kthread+0x1d2/0x200 [ 1255.917541] ? set_kthread_struct+0x80/0x80 [ 1255.919008] ret_from_fork+0x22/0x30 [ 1255.920208] [ 1255.921244] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x) [ 1255.924842] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---

(It also hasn't changed QEMU's memory usage, so.); zts-results.lD6EIq.gz https://github.com/openzfs/zfs/files/7830180/zts-results.lD6EIq.gz

In what is an ultimate basic bitch move, I just built the debian kernel packages but added CONFIG_KASAN=y where the original had "CONFIG_KASAN is unset", and installed them on a fresh sid strap: config-5.15.0-2-amd64.gz https://github.com/openzfs/zfs/files/7830184/config-5.15.0-2-amd64.gz; I can upload the send of the image later, if there's interest.

Rudimentary analysis (git diff) reveals that they're almost entirely unrelated; grepping for KASAN shows this (-your kasan, +my debian):

CONFIG_KASAN=y CONFIG_KASAN_GENERIC=y-# CONFIG_KASAN_OUTLINE is not set-CONFIG_KASAN_INLINE=y+CONFIG_KASAN_OUTLINE=y+# CONFIG_KASAN_INLINE is not set CONFIG_KASAN_STACK=y

CONFIG_KASAN_VMALLOC is not set

CONFIG_KASAN_MODULE_TEST is not set

I assume OUTLINE is the default, since I changed no other lines in the seed config. (It also seems prudent to note that I know jack squat about how these things would interact => not a clue what this realistically means.)

— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/12226#issuecomment-1007575586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7MRNWDU5TLYYAQ5FUTUU4MN3ANCNFSM46Q2YCXQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

nabijaczleweli commented 2 years ago

Hm; quoth lib/Kconfig.kasan:

choice
    prompt "Instrumentation type"
    depends on KASAN_GENERIC || KASAN_SW_TAGS
    default KASAN_OUTLINE

config KASAN_OUTLINE
    bool "Outline instrumentation"
    help
      Before every memory access compiler insert function call
      __asan_load*/__asan_store*. These functions performs check
      of shadow memory. This is slower than inline instrumentation,
      however it doesn't bloat size of kernel's .text section so
      much as inline does.

config KASAN_INLINE
    bool "Inline instrumentation"
    depends on !ARCH_DISABLE_KASAN_INLINE
    help
      Compiler directly inserts code checking shadow memory before
      memory accesses. This is faster than outline (in some workloads
      it gives about x2 boost over outline instrumentation), but
      make kernel's .text size much bigger.

endchoice

So, yes, OUTLINE is actual calls, and INLINE doubles .text. Although I wouldn't expect that to make a difference?

nabijaczleweli commented 2 years ago

Changed -smp sockets=2,cores=3,threads=2 to -smp 12 (i.e. the same amount of CPUs but a different topology), and I got this kasan warning:

Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.args_to_lua (run as root) [00:00] [PASS]
[ 1256.320566] ==================================================================
[ 1256.322746] BUG: KASAN: stack-out-of-bounds in stack_trace_consume_entry+0x58/0x80
[ 1256.324973] Write of size 8 at addr ffff88830196f770 by task zfs/90929
[ 1256.326822]
[ 1256.327297] CPU: 1 PID: 90929 Comm: zfs Tainted: P           OE     5.15.0-2-amd64 #1  Debian 5.15.5-2.1
[ 1256.329806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1256.332029] Call Trace:
[ 1256.332723]  <TASK>
[ 1256.333310]  dump_stack_lvl+0x46/0x5a
[ 1256.334335]  print_address_description.constprop.0+0x1f/0x140
[ 1256.335895]  ? stack_trace_consume_entry+0x58/0x80
[ 1256.337188]  kasan_report.cold+0x83/0xdf
[ 1256.338256]  ? stack_trace_consume_entry+0x58/0x80
[ 1256.339547]  ? kasan_save_stack+0x1b/0x40
[ 1256.340583]  stack_trace_consume_entry+0x58/0x80
[ 1256.341757]  ? create_prof_cpu_mask+0x20/0x20
[ 1256.342881]  arch_stack_walk+0x78/0xf0
[ 1256.343913]  ? kfree+0xc5/0x280
[ 1256.344726]  ? kasan_save_stack+0x1b/0x40
[ 1256.345761]  ? kfree+0xc5/0x280
[ 1256.346573]  stack_trace_save+0x91/0xc0
[ 1256.347581]  ? stack_trace_consume_entry+0x80/0x80
[ 1256.348810]  ? luaD_call+0x11f/0x200 [zlua]
[ 1256.349992]  ? resume_cb+0x190/0x190 [zlua]
[ 1256.351122]  kasan_save_stack+0x1b/0x40
[ 1256.352148]  ? lua_setfield+0xb0/0xb0 [zlua]
[ 1256.353293]  ? luaD_rawrunprotected+0x10a/0x160 [zlua]
[ 1256.354656]  ? lua_setfield+0xb0/0xb0 [zlua]
[ 1256.355816]  ? f_parser+0x190/0x190 [zlua]
[ 1256.356911]  ? lua_setfield+0xb0/0xb0 [zlua]
[ 1256.358055]  ? lua_setfield+0xb0/0xb0 [zlua]
[ 1256.359200]  ? luaD_rawrunprotected+0xd2/0x160 [zlua]
[ 1256.360569]  ? lua_setfield+0xb0/0xb0 [zlua]
[ 1256.361711]  ? luaD_rawrunprotected+0xd2/0x160 [zlua]
[ 1256.363058]  ? luaF_close+0x33/0x1b0 [zlua]
[ 1256.364193]  ? luaD_pcall+0xa0/0x130 [zlua]
[ 1256.365321]  ? lua_pcallk+0x10a/0x290 [zlua]
[ 1256.366461]  kasan_set_track+0x1c/0x30
[ 1256.367445]  kasan_set_free_info+0x20/0x30
[ 1256.368501]  __kasan_slab_free+0xec/0x120
[ 1256.369529]  slab_free_freelist_hook+0x66/0x130
[ 1256.370692]  ? zcp_eval+0x4b4/0x9c0 [zfs]
[ 1256.373034]  kfree+0xc5/0x280
[ 1256.373812]  zcp_eval+0x4b4/0x9c0 [zfs]
[ 1256.375553]  ? zcp_dataset_hold+0x150/0x150 [zfs]
[ 1256.377487]  ? spl_kmem_alloc_impl+0xf6/0x110 [spl]
[ 1256.378841]  ? nv_mem_zalloc.isra.0+0x33/0x60 [znvpair]
[ 1256.380294]  ? nvlist_xalloc.part.0+0x86/0x140 [znvpair]
[ 1256.381699]  ? zfsdev_ioctl_common+0x635/0xbd0 [zfs]
[ 1256.383710]  ? zfsdev_ioctl+0x53/0xe0 [zfs]
[ 1256.385518]  ? __x64_sys_ioctl+0xb9/0xf0
[ 1256.386540]  ? do_syscall_64+0x3b/0xc0
[ 1256.387526]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1256.388867]  ? nvt_remove_nvpair+0xde/0x1e0 [znvpair]
[ 1256.390211]  ? nvpair_type_is_array+0x50/0x50 [znvpair]
[ 1256.391612]  ? nvt_remove_nvpair+0x13f/0x1e0 [znvpair]
[ 1256.392984]  ? nvt_lookup_name_type.isra.0+0xc8/0x110 [znvpair]
[ 1256.394552]  ? fnvlist_lookup_nvpair+0x5f/0xc0 [znvpair]
[ 1256.395982]  ? fnvlist_remove_nvpair+0x40/0x40 [znvpair]
[ 1256.397398]  zfs_ioc_channel_program+0x169/0x200 [zfs]
[ 1256.399460]  ? zfs_ioc_redact+0x180/0x180 [zfs]
[ 1256.401350]  ? nvlist_xalloc.part.0+0xde/0x140 [znvpair]
[ 1256.402764]  ? nvlist_lookup_nvpair_embedded_index+0x20/0x20 [znvpair]
[ 1256.404500]  zfsdev_ioctl_common+0x69a/0xbd0 [zfs]
[ 1256.406461]  ? zfsdev_state_destroy+0x70/0x70 [zfs]
[ 1256.408451]  ? __kmalloc_node+0x435/0x4e0
[ 1256.409482]  ? __virt_addr_valid+0xbe/0x130
[ 1256.410555]  ? _copy_from_user+0x3a/0x70
[ 1256.411602]  zfsdev_ioctl+0x53/0xe0 [zfs]
[ 1256.413371]  __x64_sys_ioctl+0xb9/0xf0
[ 1256.414339]  do_syscall_64+0x3b/0xc0
[ 1256.415294]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1256.416587] RIP: 0033:0x7fe099b92a97
[ 1256.417522] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 08
[ 1256.422212] RSP: 002b:00007ffec362a168 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1256.424131] RAX: ffffffffffffffda RBX: 00007fe096336700 RCX: 00007fe099b92a97
[ 1256.425937] RDX: 00007fe096333050 RSI: 0000000000005a48 RDI: 0000000000000004
[ 1256.427756] RBP: 00007ffec362a220 R08: 00007fe096637000 R09: 0000000000000000
[ 1256.429654] R10: 00007fe09b6a2710 R11: 0000000000000246 R12: 0000000000005a48
[ 1256.431560] R13: 00007fe096333050 R14: 00007fe096333030 R15: 0000000000000004
[ 1256.433438]  </TASK>
[ 1256.434043]
[ 1256.434465] The buggy address belongs to the page:
[ 1256.435753] page:000000005b97f116 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x30196f
[ 1256.438234] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 1256.439987] raw: 0017ffffc0000000 0000000000000000 ffffea000c065bc8 0000000000000000
[ 1256.442049] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 1256.444121] page dumped because: kasan: bad access detected
[ 1256.445613] KASAN internal error: frame info validation failed; invalid marker: 18446612690182375432
[ 1256.448039]
[ 1256.448449] Memory state around the buggy address:
[ 1256.449735]  ffff88830196f600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1256.451678]  ffff88830196f680: 00 00 00 00 00 00 f1 f1 f1 f1 00 f3 f1 f1 f1 f1
[ 1256.453597] >ffff88830196f700: 00 00 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 f1 00
[ 1256.455537]                                                              ^
[ 1256.457361]  ffff88830196f780: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1256.459291]  ffff88830196f800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1256.461197] ==================================================================
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.divide_by_zero (run as root) [00:00] [PASS]

And then this panic:

Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table (run as root) [00:00] [PASS]
[ 1307.478695] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1307.481210] CPU: 0 PID: 90683 Comm: txg_sync Tainted: P    B      OE     5.15.0-2-amd64 #1  Debian 5.15.5-2.1
[ 1307.484141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1307.486629] Call Trace:
[ 1307.487405]  <TASK>
[ 1307.488057]  dump_stack_lvl+0x46/0x5a
[ 1307.489247]  panic+0x18b/0x389
[ 1307.490185]  ? __warn_printk+0xf3/0xf3
[ 1307.491323]  ? kasan_save_stack+0x32/0x40
[ 1307.492614]  ? kasan_save_stack+0x1b/0x40
[ 1307.493844]  ? __schedule+0xca/0xf90
[ 1307.494946]  ? schedule+0x30/0x120
[ 1307.496155]  __schedule+0xf8b/0xf90
[ 1307.497450]  ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0
[ 1307.499518]  ? io_schedule_timeout+0xb0/0xb0
[ 1307.501204]  ? llist_add_batch+0x33/0x50
[ 1307.502411]  schedule+0x6d/0x120
[ 1307.503395]  schedule_timeout+0xe4/0x1f0
[ 1307.504578]  ? usleep_range+0xe0/0xe0
[ 1307.505696]  ? try_to_wake_up+0x392/0x910
[ 1307.507131]  ? __bpf_trace_tick_stop+0xe0/0xe0
[ 1307.508886]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 1307.510848]  ? __native_queued_spin_unlock+0x9/0x10
[ 1307.512782]  ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[ 1307.515329]  __cv_timedwait_common+0x19e/0x2b0 [spl]
[ 1307.517378]  ? __cv_wait_idle+0xd0/0xd0 [spl]
[ 1307.519165]  ? recalc_sigpending+0x5a/0x70
[ 1307.520781]  ? finish_wait+0x100/0x100
[ 1307.522190]  ? mutex_unlock+0x80/0xd0
[ 1307.523541]  ? bpobj_space+0x10c/0x120 [zfs]
[ 1307.526085]  __cv_timedwait_idle+0x9a/0xe0 [spl]
[ 1307.527845]  ? __cv_timedwait_sig+0x70/0x70 [spl]
[ 1307.529535]  ? __bitmap_weight+0x71/0x90
[ 1307.530987]  txg_sync_thread+0x24f/0x760 [zfs]
[ 1307.533594]  ? kasan_set_track+0x1c/0x30
[ 1307.534955]  ? txg_fini+0x300/0x300 [zfs]
[ 1307.537273]  thread_generic_wrapper+0xa8/0xc0 [spl]
[ 1307.539001]  ? __thread_exit+0x20/0x20 [spl]
[ 1307.540517]  kthread+0x1d2/0x200
[ 1307.541640]  ? set_kthread_struct+0x80/0x80
[ 1307.543074]  ret_from_fork+0x22/0x30
[ 1307.544305]  </TASK>
[ 1307.545446] Kernel Offset: 0xee00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1307.548672] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---

I'm running with 64G now but bumping to that made it decide that it's going to run like absolute shit; nevertheless:

Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.args_to_lua (run as root) [00:00] [PASS]
[ 1243.231345] ==================================================================
[ 1243.233300] BUG: KASAN: stack-out-of-bounds in stack_trace_consume_entry+0x58/0x80
[ 1243.235300] Write of size 8 at addr ffff88811a437770 by task zfs/94732
[ 1243.237007]
[ 1243.237420] CPU: 10 PID: 94732 Comm: zfs Tainted: P           OE     5.15.0-2-amd64 #1  Debian 5.15.5-2.1
[ 1243.239855] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1243.241980] Call Trace:
[ 1243.242645]  <TASK>
[ 1243.243207]  dump_stack_lvl+0x46/0x5a
[ 1243.244220]  print_address_description.constprop.0+0x1f/0x140
[ 1243.245709]  ? stack_trace_consume_entry+0x58/0x80
[ 1243.246942]  kasan_report.cold+0x83/0xdf
[ 1243.247984]  ? stack_trace_consume_entry+0x58/0x80
[ 1243.249224]  ? kasan_save_stack+0x1b/0x40
[ 1243.250269]  stack_trace_consume_entry+0x58/0x80
[ 1243.251482]  ? create_prof_cpu_mask+0x20/0x20
[ 1243.252615]  arch_stack_walk+0x78/0xf0
[ 1243.253605]  ? kfree+0xc5/0x280
[ 1243.254426]  ? kasan_save_stack+0x1b/0x40
[ 1243.255493]  ? kfree+0xc5/0x280
[ 1243.256315]  stack_trace_save+0x91/0xc0
[ 1243.257318]  ? stack_trace_consume_entry+0x80/0x80
[ 1243.258564]  ? luaD_call+0x11f/0x200 [zlua]
[ 1243.259791]  ? resume_cb+0x190/0x190 [zlua]
[ 1243.260931]  kasan_save_stack+0x1b/0x40
[ 1243.261924]  ? lua_setfield+0xb0/0xb0 [zlua]
[ 1243.263066]  ? luaD_rawrunprotected+0x10a/0x160 [zlua]
[ 1243.264457]  ? lua_setfield+0xb0/0xb0 [zlua]
[ 1243.265604]  ? f_parser+0x190/0x190 [zlua]
[ 1243.266720]  ? lua_setfield+0xb0/0xb0 [zlua]
[ 1243.267890]  ? lua_setfield+0xb0/0xb0 [zlua]
[ 1243.269038]  ? luaD_rawrunprotected+0xd2/0x160 [zlua]
[ 1243.270393]  ? lua_setfield+0xb0/0xb0 [zlua]
[ 1243.271648]  ? luaD_rawrunprotected+0xd2/0x160 [zlua]
[ 1243.273003]  ? luaF_close+0x33/0x1b0 [zlua]
[ 1243.274127]  ? luaD_pcall+0xa0/0x130 [zlua]
[ 1243.275255]  ? lua_pcallk+0x10a/0x290 [zlua]
[ 1243.276418]  kasan_set_track+0x1c/0x30
[ 1243.277396]  kasan_set_free_info+0x20/0x30
[ 1243.278462]  __kasan_slab_free+0xec/0x120
[ 1243.279533]  slab_free_freelist_hook+0x66/0x130
[ 1243.280704]  ? zcp_eval+0x4b4/0x9c0 [zfs]
[ 1243.282984]  kfree+0xc5/0x280
[ 1243.283781]  zcp_eval+0x4b4/0x9c0 [zfs]
[ 1243.285509]  ? zcp_dataset_hold+0x150/0x150 [zfs]
[ 1243.287483]  ? spl_kmem_alloc_impl+0xf6/0x110 [spl]
[ 1243.288836]  ? nv_mem_zalloc.isra.0+0x33/0x60 [znvpair]
[ 1243.290289]  ? nvlist_xalloc.part.0+0x86/0x140 [znvpair]
[ 1243.291736]  ? zfsdev_ioctl_common+0x635/0xbd0 [zfs]
[ 1243.293750]  ? zfsdev_ioctl+0x53/0xe0 [zfs]
[ 1243.295599]  ? __x64_sys_ioctl+0xb9/0xf0
[ 1243.296638]  ? do_syscall_64+0x3b/0xc0
[ 1243.297623]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1243.298956]  ? nvt_remove_nvpair+0xde/0x1e0 [znvpair]
[ 1243.300334]  ? nvpair_type_is_array+0x50/0x50 [znvpair]
[ 1243.301728]  ? nvt_remove_nvpair+0x13f/0x1e0 [znvpair]
[ 1243.303145]  ? nvt_lookup_name_type.isra.0+0xc8/0x110 [znvpair]
[ 1243.304749]  ? fnvlist_lookup_nvpair+0x5f/0xc0 [znvpair]
[ 1243.306174]  ? fnvlist_remove_nvpair+0x40/0x40 [znvpair]
[ 1243.307621]  zfs_ioc_channel_program+0x169/0x200 [zfs]
[ 1243.309683]  ? zfs_ioc_redact+0x180/0x180 [zfs]
[ 1243.311617]  ? nvlist_xalloc.part.0+0xde/0x140 [znvpair]
[ 1243.313051]  ? nvlist_lookup_nvpair_embedded_index+0x20/0x20 [znvpair]
[ 1243.314783]  zfsdev_ioctl_common+0x69a/0xbd0 [zfs]
[ 1243.316777]  ? zfsdev_state_destroy+0x70/0x70 [zfs]
[ 1243.318770]  ? __kmalloc_node+0x435/0x4e0
[ 1243.319835]  ? __virt_addr_valid+0xbe/0x130
[ 1243.320924]  ? _copy_from_user+0x3a/0x70
[ 1243.321972]  zfsdev_ioctl+0x53/0xe0 [zfs]
[ 1243.323772]  __x64_sys_ioctl+0xb9/0xf0
[ 1243.324746]  do_syscall_64+0x3b/0xc0
[ 1243.325678]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1243.326986] RIP: 0033:0x7f8180554a97
[ 1243.327947] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 08
[ 1243.332700] RSP: 002b:00007fff144d26e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1243.334636] RAX: ffffffffffffffda RBX: 00007f817ccf8700 RCX: 00007f8180554a97
[ 1243.336453] RDX: 00007f817ccf5050 RSI: 0000000000005a48 RDI: 0000000000000004
[ 1243.338264] RBP: 00007fff144d27a0 R08: 00007f817cff9000 R09: 0000000000000000
[ 1243.340084] R10: 00007f8182064710 R11: 0000000000000246 R12: 0000000000005a48
[ 1243.341888] R13: 00007f817ccf5050 R14: 00007f817ccf5030 R15: 0000000000000004
[ 1243.343729]  </TASK>
[ 1243.344307]
[ 1243.344711] The buggy address belongs to the page:
[ 1243.345937] page:00000000a84dd4da refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11a437
[ 1243.348354] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 1243.350028] raw: 0017ffffc0000000 0000000000000000 ffffffff09ee0101 0000000000000000
[ 1243.352007] raw: 0000000000000000 0000000000200000 00000000ffffffff 0000000000000000
[ 1243.353976] page dumped because: kasan: bad access detected
[ 1243.355416] KASAN internal error: frame info validation failed; invalid marker: 18446612695471456264
[ 1243.357698]
[ 1243.358099] Memory state around the buggy address:
[ 1243.359326]  ffff88811a437600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1243.361170]  ffff88811a437680: 00 00 00 00 00 00 f1 f1 f1 f1 00 f3 f1 f1 f1 f1
[ 1243.363006] >ffff88811a437700: 00 00 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 f1 00
[ 1243.364854]                                                              ^
[ 1243.366601]  ffff88811a437780: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1243.368456]  ffff88811a437800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1243.370288] ==================================================================
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.divide_by_zero (run as root) [00:00] [PASS]

and, indeed:

Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table (run as root) [00:00] [PASS]
[ 1291.472630] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1291.474827] CPU: 11 PID: 94518 Comm: txg_sync Tainted: P    B      OE     5.15.0-2-amd64 #1  Debian 5.15.5-2.1
[ 1291.477403] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1291.479562] Call Trace:
[ 1291.480231]  <TASK>
[ 1291.480798]  dump_stack_lvl+0x46/0x5a
[ 1291.481839]  panic+0x18b/0x389
[ 1291.482687]  ? __warn_printk+0xf3/0xf3
[ 1291.483682]  ? kasan_save_stack+0x32/0x40
[ 1291.484754]  ? kasan_save_stack+0x1b/0x40
[ 1291.485819]  ? __schedule+0xca/0xf90
[ 1291.486776]  ? schedule+0x30/0x120
[ 1291.488448]  __schedule+0xf8b/0xf90
[ 1291.489857]  ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0
[ 1291.491444]  ? io_schedule_timeout+0xb0/0xb0
[ 1291.493453]  ? llist_add_batch+0x33/0x50
[ 1291.494661]  schedule+0x6d/0x120
[ 1291.495524]  schedule_timeout+0xe4/0x1f0
[ 1291.496660]  ? usleep_range+0xe0/0xe0
[ 1291.498114]  ? try_to_wake_up+0x392/0x910
[ 1291.499820]  ? __bpf_trace_tick_stop+0xe0/0xe0
[ 1291.501502]  ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 1291.503940]  ? __native_queued_spin_unlock+0x9/0x10
[ 1291.505782]  ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[ 1291.508329]  __cv_timedwait_common+0x19e/0x2b0 [spl]
[ 1291.510112]  ? __cv_wait_idle+0xd0/0xd0 [spl]
[ 1291.511990]  ? recalc_sigpending+0x5a/0x70
[ 1291.513854]  ? finish_wait+0x100/0x100
[ 1291.515414]  ? mutex_unlock+0x80/0xd0
[ 1291.517352]  ? bpobj_space+0x10c/0x120 [zfs]
[ 1291.520380]  __cv_timedwait_idle+0x9a/0xe0 [spl]
[ 1291.522484]  ? __cv_timedwait_sig+0x70/0x70 [spl]
[ 1291.524696]  ? __bitmap_weight+0x71/0x90
[ 1291.526262]  txg_sync_thread+0x24f/0x760 [zfs]
[ 1291.529279]  ? kasan_set_track+0x1c/0x30
Test: /usr/local/share/zfs/zfs-tes[ts/t es1ts2/f9un1ct.530368]  ? txg_fini+0x300/0x300 [zfs]
iona[l/ ch1an2n9e1l_.pr533317]  thread_generic_wrapper+0xa8/0xc0 [spl]
ogram[/ l1ua2_9c1or.e/5t35126]  ? __thread_exit+0x20/0x20 [spl]
[ 1291.537038]  kthread+0x1d2/0x200
st.s[t ac1k_2gs9ub1 (.r537988]  ? set_kthread_struct+0x80/0x80
[ 1291.539652]  ret_from_fork+0x22/0x30
un[  a1s2 r9o1ot.) 5[4000:947]  </TASK>
00] [PASS]
[ 1291.542504] Kernel Offset: 0x20600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1291.546252] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---

I can't necessarily give it any, uh, more RAM? (I mean I could, but I don't love the idea of swapping out my MX.) And the overall times don't seem to breach more than one CPU, anyway, so?

real    22m36.744s
user    24m19.753s
sys     5m46.450s

Here's a send of the image and qemu driver (the boot bundle needs extracted from /boot, or a bootloader installed; this also wants a scratch filesystem at /scratchpsko (i just did zpool create scratchpsko vdb and chown/chmod)) if you're interested: https://foreign.nabijaczleweli.xyz/pub/kt

rincebrain commented 2 years ago

Curious. I'm wildly speculating that all the outline calls make it more vulnerable to something smashing it in ways it can't recover from? Or I keep getting lucky with my smashing not blowing up the world...I'll try the VM and see if it blows the same way for me, and if swapping the kernel around changes anything.

szubersk commented 2 years ago

Casual 2 cents from papa know-it-all.

I just:

16 vCPU/10 GiB VM used, no memory problems (so far).

<3>[  506.091401] ==================================================================                                                                                                          
<3>[  506.095311] BUG: KASAN: stack-out-of-bounds in auxgetinfo+0x306/0x600 [zlua]                                                                                                            
<3>[  506.095311] Write of size 4 at addr ffff8881099cf5c8 by task txg_sync/56269                                                                                                             
<3>[  506.095311]                                                                                                                                                                             
<3>[  506.095311] CPU: 1 PID: 56269 Comm: txg_sync Tainted: P           O      5.15.23-kasan #1                                                                                               
<3>[  506.095311] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014                                                                                             
<3>[  506.095311] Call Trace:                                                                                                                                                                 
<3>[  506.095311]  <TASK>                                                                                                                                                                     
<3>[  506.095311]  dump_stack_lvl+0x46/0x5a                                                                                                                                                   
<3>[  506.095311]  print_address_description.constprop.0+0x1f/0x140                                                                                                                           
<3>[  506.095311]  ? auxgetinfo+0x306/0x600 [zlua]                                                                                                                                            
<3>[  506.095311]  kasan_report.cold+0x83/0xdf                                                                                                                                                
<3>[  506.095311]  ? auxgetinfo+0x306/0x600 [zlua]                                                                                                                                            
<3>[  506.095311]  kasan_check_range+0x142/0x190                                                                                                                                              
<3>[  506.095311]  memcpy+0x39/0x60                                                                                                                                                           
<3>[  506.095311]  auxgetinfo+0x306/0x600 [zlua]                                                                                                                                              
<3>[  506.095311]  ? newshrstr+0xe6/0x210 [zlua]                                                                                                                                              
<3>[  506.095311]  lua_getinfo+0xe0/0x310 [zlua]                                                                                                                                              
<3>[  506.095311]  ? zcp_cleanup+0x90/0x90 [zfs]                                                                                                                                              
<3>[  506.095311]  luaL_traceback+0x11d/0x220 [zlua]                                                                                                                                          
<3>[  506.095311]  ? pushfuncname+0x220/0x220 [zlua]                                                                                                                                          
<3>[  506.095311]  ? luaV_tonumber+0x1b0/0x1b0 [zlua]                                                                                                                                         
<3>[  506.095311]  ? luaV_gettable+0xea/0x3c0 [zlua]                                                                                                                                          
<3>[  506.095311]  ? zcp_cleanup+0x90/0x90 [zfs]                                                                                                                                              
<3>[  506.095311]  zcp_error_handler+0x3d/0x70 [zfs]                                                                                                                                          
<3>[  506.095311]  luaD_precall+0x2d3/0xd40 [zlua]                                                                                                                                            
<3>[  506.095311]  luaD_call+0x111/0x280 [zlua]                                                                                                                                               
<3>[  506.095311]  ? luaB_getmetatable+0x50/0x50 [zlua]                                                                                                                                       
<3>[  506.095311]  luaG_errormsg+0x205/0x2b0 [zlua]                                                                                                                                           
<3>[  506.095311]  lua_error+0xa/0x10 [zlua]                                                                                                                                                  
<3>[  506.327275]  luaD_precall+0x2d3/0xd40 [zlua]                                                                                                                                            
<3>[  506.327275]  luaV_execute+0x1cc1/0x4800 [zlua]                                                                                                                                          
<3>[  506.327275]  ? luaD_precall+0x86e/0xd40 [zlua]                                                                                                                                          
<3>[  506.327275]  luaD_call+0x201/0x280 [zlua]                                                                                                                                               
<3>[  506.327275]  luaD_rawrunprotected+0x114/0x200 [zlua]                                                                                                                                    
<3>[  506.327275]  ? lua_setmetatable+0x570/0x570 [zlua]                                                                                                                                      
<3>[  506.327275]  ? f_parser+0x340/0x340 [zlua]                                                                                                                                              
<3>[  506.327275]  ? luaD_rawrunprotected+0xfd/0x200 [zlua]                                                                                                                                   
<3>[  506.327275]  ? luaM_realloc_+0x99/0x220 [zlua]                                                                                                                                          
<3>[  506.327275]  luaD_pcall+0xe0/0x300 [zlua]                                                                                                                                               
<3>[  506.327275]  ? luaH_newkey+0x38b/0x520 [zlua]                                 
<3>[  506.327275]  lua_pcallk+0x154/0x6b0 [zlua]                                    
<3>[  506.327275]  ? luaV_settable+0x3ab/0x550 [zlua]                               
<3>[  506.419359]  ? f_call+0x90/0x90 [zlua]
<3>[  506.419359]  ? dsl_dir_phys+0x60/0x60 [zfs]
<3>[  506.419359]  ? dsl_dir_phys+0x60/0x60 [zfs]
<3>[  506.419359]  zcp_eval_impl+0x158/0x8a0 [zfs]
<3>[  506.419359]  ? zcp_eval_impl+0x8a0/0x8a0 [zfs]
<3>[  506.419359]  dsl_sync_task_sync+0x213/0x3d0 [zfs]
<3>[  506.419359]  dsl_pool_sync+0x969/0xda0 [zfs]
<3>[  506.419359]  ? zap_lookup+0x12/0x20 [zfs] 
<3>[  506.419359]  ? dsl_pool_undirty_space+0x1e0/0x1e0 [zfs]
<3>[  506.419359]  ? vdev_obsolete_sm_object+0x190/0x190 [zfs]
<3>[  506.419359]  spa_sync_iterate_to_convergence+0x18a/0x450 [zfs]
<3>[  506.419359]  spa_sync+0x6c9/0x12c0 [zfs]
<3>[  506.419359]  ? __cond_resched+0x16/0x40
<3>[  506.419359]  ? spa_async_dispatch+0x1b0/0x1b0 [zfs]
<3>[  506.515275]  ? spa_txg_history_set+0x14e/0x1e0 [zfs]
<3>[  506.515275]  txg_sync_thread+0x5ae/0x960 [zfs]
<3>[  506.515275]  ? slab_free_freelist_hook+0x66/0x130
<3>[  506.515275]  ? txg_dispatch_callbacks+0x1b0/0x1b0 [zfs]
<3>[  506.515275]  ? kfree+0xc5/0x280
<3>[  506.515275]  ? txg_dispatch_callbacks+0x1b0/0x1b0 [zfs]
<3>[  506.515275]  thread_generic_wrapper+0x171/0x200 [spl]
<3>[  506.515275]  ? _raw_spin_unlock_irqrestore+0xa/0x20
<3>[  506.515275]  ? IS_ERR+0x10/0x10 [spl]
<3>[  506.515275]  kthread+0x127/0x150
<3>[  506.515275]  ? set_kthread_struct+0x40/0x40
<3>[  506.515275]  ret_from_fork+0x22/0x30
<3>[  506.515275]  </TASK>
<3>[  506.515275] 
<3>[  506.515275] The buggy address belongs to the page:
<4>[  506.515275] page:00000000e0daaf00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1099cf
<4>[  506.515275] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
<4>[  506.515275] raw: 0017ffffc0000000 0000000000000000 ffffea0004267388 0000000000000000
<4>[  506.515275] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
<4>[  506.515275] page dumped because: kasan: bad access detected
<3>[  506.515275] KASAN internal error: frame info validation failed; invalid marker: 16140896666449346560
<3>[  506.515275] 
<3>[  506.515275] Memory state around the buggy address:
<3>[  506.515275]  ffff8881099cf480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<3>[  506.643250]  ffff8881099cf500: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 f1 f1
<3>[  506.643250] >ffff8881099cf580: f1 f1 00 00 00 00 00 00 00 f1 f1 00 00 00 00 00
<3>[  506.643250]                                               ^
<3>[  506.675258]  ffff8881099cf600: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
<3>[  506.675258]  ffff8881099cf680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<3>[  506.675258] ==================================================================