Open aerusso opened 3 years ago
you can. it's straightforward, but slow like molasses in January.
https://github.com/zfsonlinux/zfs/pull/4465 this was discovered and fixed using KASAN.
You can absolutely do this locally. All you need to do is build a KASAN enabled kernel, then build ZFS as usual and run the test suite. The kernel documentation you linked to shows which CONFIG options need to be enabled. While you're at it I'd also suggest enabling the kernel kmemleak checker.
This is something I'd love to enable in the CI but the last time we investigated it the performance impact made it impractical. From what I've read the performance is better with the latest kernels, but I don't know if that means its fast enough to use in the CI environment.
Well, you could do that, but starting presumably with zstd's merging it will fail to compile unless you make dummy functions for __asan_poison_memory_region
and __asan_unpoison_memory_region
, because they're behind #if defined (ADDRESS_SANITIZER)
in the lib/zstd.c code, which KASAN also apparently defines.
(I ran into this with 4.19.194 and ffdf019cb, just for reference.)
The exact patch I used is:
diff --git a/module/zstd/zfs_zstd.c b/module/zstd/zfs_zstd.c
index fc1b0359a..fc51a2c50 100644
--- a/module/zstd/zfs_zstd.c
+++ b/module/zstd/zfs_zstd.c
@@ -202,6 +202,11 @@ static struct zstd_fallback_mem zstd_dctx_fallback;
static struct zstd_pool *zstd_mempool_cctx;
static struct zstd_pool *zstd_mempool_dctx;
+void __asan_unpoison_memory_region(void const volatile *addr, size_t size);
+void __asan_poison_memory_region(void const volatile *addr, size_t size);
+void __asan_poison_memory_region(void const volatile *addr, size_t size) {};
+void __asan_unpoison_memory_region(void const volatile *addr, size_t size) {};
+
static void
zstd_mempool_reap(struct zstd_pool *zstd_mempool)
I'll probably eventually try getting a refined version of this merged, at a minimum with some #ifdef
guards around it.
edit to add: Interactively (..over SSH), with CONFIG_KASAN_INLINE=y
, it's seemed fine for me. (Though my poor low-memory 4GB VM does keep OOMing...) Maybe give it another try with that?
That's interesting, clearly I haven't tried this since we incorporated zstd! Thanks for posting the patch, it sounds like we'll want to incorporate some version of your change to sort the build out. It's also encouraging to hear your performance wasn't terrible. My recollection is that interactively it felt fine, but it at least doubled the total run time for the test suite.
@rincebrain Are you saying you ran the ZTS (with ZFS version ffdf019) on a KASAN kernel, and had no memory corruption issues?
@rincebrain Are you saying you ran the ZTS (with ZFS version ffdf019) on a KASAN kernel, and had no memory corruption issues?
Oh, no, I would definitely not say that...
I just was looking for a specific problem when I tried building KASAN in (...yesterday), and hadn't tried running through ZTS at the time.
I have gotten through a ZTS run, though indeed, with at least one KASAN complaint in syslog. I just haven't filed it yet.
Could you give me that info? (Either email me directly, or just open the bug.) At a minimum, I'd like to sanity-check that I am able to reproduce it.
(My ulterior motive here is that I believe that there is a memory corruption issue causing a bug I'm experiencing. That you're finding a memory corruption bug is a "good" sign that I can at least fix some bug of that type.)
Sure, let me just identify which test(s) were involved and reproduce it on reboot...
(I, too, started down this rabbit hole for such suspicions...)
@bghira:
you can. it's straightforward, but slow like molasses in January.
@behlendorf:
the last time we investigated it the performance impact made it impractical
What about a ready to run automated test environment, instead of a continuous? Every other test available in CI, but with KASAN enabled, that could be run manually or weekly.
Not sure if the same applies to ASAN (#12216), or if it could be in continuous CI. In a probable evolution, the ready to run test would enable env vars to get more details from ASAN.
I still think the overhead for KASAN when configured inline is probably low enough to permit CI usage, assuming A) enough runners for the rate of PR updates and B) increased runtime allowance for ZTS in it (because the overhead is, indeed, not zero, though I got sidetracked by non-KASAN tests before I measured a complete run with and without KASAN on the same commit).
Though, I don't know what the thresholds for "too much" are, here - 1.5x runtime? Doubled? Tripled? Similar numbers for RAM on the runners? (In my limited experience, IIRC, using 4GB RAM with and without KASAN ended with the OOM killer murdering every process in the former case before finishing a ZTS run, though I would have expected ARC to be smaller and life to move on...)
Though, since AFAICT none of {CentOS,Fedora,Debian,Ubuntu} ship a premade KASAN kernel package, this would require maintenance rebuilding that sometimes...though Linux makes custom kernel packages pretty simple, at least.
apparently arm64 has a few features that make kasan run better there.
(Gonna move discussion from #12928 to stop flooding the poor PR.)
So, I ran zfs-tests -T functional
to completion on an Ubuntu 18.04 VM with a handbuilt 5.15 kernel with kASAN.
It took 05:06:03, came back with:
Tests with results other than PASS that are expected:
FAIL casenorm/mixed_formd_delete (https://github.com/openzfs/zfs/issues/7633)
FAIL casenorm/mixed_formd_lookup (https://github.com/openzfs/zfs/issues/7633)
FAIL casenorm/mixed_formd_lookup_ci (https://github.com/openzfs/zfs/issues/7633)
FAIL casenorm/mixed_none_lookup_ci (https://github.com/openzfs/zfs/issues/7633)
FAIL casenorm/sensitive_formd_delete (https://github.com/openzfs/zfs/issues/7633)
FAIL casenorm/sensitive_formd_lookup (https://github.com/openzfs/zfs/issues/7633)
FAIL cli_root/zpool_import/import_rewind_device_replaced (Arbitrary pool rewind is not guaranteed)
SKIP cli_root/zpool_import/zpool_import_missing_003_pos (https://github.com/openzfs/zfs/issues/6839)
SKIP crtime/crtime_001_pos (Kernel statx(2) system call required on Linux)
FAIL history/history_006_neg (https://github.com/openzfs/zfs/issues/5657)
FAIL history/history_008_pos (Known issue)
SKIP io/io_uring (io_uring support required)
FAIL mmp/mmp_exported_import (Known issue)
FAIL mmp/mmp_inactive_import (Known issue)
FAIL no_space/enospc_002_pos (Exact free space reporting is not guaranteed)
SKIP pam/setup (pamtester might be not available)
FAIL refreserv/refreserv_004_pos (Known issue)
SKIP removal/removal_with_zdb (Known issue)
FAIL rsend/rsend_007_pos (Known issue)
SKIP rsend/rsend_008_pos (https://github.com/openzfs/zfs/issues/6066)
FAIL rsend/rsend_010_pos (Known issue)
FAIL rsend/rsend_011_pos (Known issue)
FAIL snapshot/rollback_003_pos (Known issue)
SKIP user_namespace/setup (Kernel user namespace support required)
FAIL vdev_zaps/vdev_zaps_007_pos (Known issue)
FAIL zvol/zvol_misc/zvol_misc_snapdev (https://github.com/openzfs/zfs/issues/12621)
FAIL zvol/zvol_misc/zvol_misc_volmode (Known issue)
Tests with result of PASS that are unexpected:
Tests with results other than PASS that are unexpected:
FAIL cli_root/zfs_load-key/zfs_load-key_all (expected PASS)
FAIL cli_root/zfs_load-key/zfs_load-key_https (expected PASS)
FAIL cli_root/zfs_load-key/zfs_load-key_location (expected PASS)
FAIL cli_root/zfs_load-key/zfs_load-key_recursive (expected PASS)
FAIL cli_root/zpool_create/zpool_create_features_007_pos (expected PASS)
FAIL cli_root/zpool_create/zpool_create_features_008_pos (expected PASS)
SKIP cli_root/zpool_expand/zpool_expand_001_pos (expected PASS)
SKIP cli_root/zpool_expand/zpool_expand_003_neg (expected PASS)
SKIP cli_root/zpool_expand/zpool_expand_005_pos (expected PASS)
FAIL cli_root/zpool_import/zpool_import_errata4 (expected PASS)
FAIL cli_root/zpool_initialize/zpool_initialize_suspend_resume (expected PASS)
SKIP cli_root/zpool_reopen/setup (expected PASS)
SKIP cli_root/zpool_reopen/zpool_reopen_001_pos (expected PASS)
SKIP cli_root/zpool_reopen/zpool_reopen_002_pos (expected PASS)
SKIP cli_root/zpool_reopen/zpool_reopen_003_pos (expected PASS)
SKIP cli_root/zpool_reopen/zpool_reopen_004_pos (expected PASS)
SKIP cli_root/zpool_reopen/zpool_reopen_005_pos (expected PASS)
SKIP cli_root/zpool_reopen/zpool_reopen_006_neg (expected PASS)
SKIP cli_root/zpool_reopen/zpool_reopen_007_pos (expected PASS)
SKIP cli_root/zpool_split/zpool_split_wholedisk (expected PASS)
FAIL cli_root/zpool_status/zpool_status_features_001_pos (expected PASS)
FAIL cli_root/zpool_upgrade/zpool_upgrade_features_001_pos (expected PASS)
FAIL events/zed_fd_spill (expected PASS)
SKIP fault/auto_offline_001_pos (expected PASS)
SKIP fault/auto_online_001_pos (expected PASS)
SKIP fault/auto_online_002_pos (expected PASS)
SKIP fault/auto_replace_001_pos (expected PASS)
SKIP fault/auto_spare_ashift (expected PASS)
SKIP fault/auto_spare_shared (expected PASS)
SKIP procfs/pool_state (expected PASS)
FAIL redacted_send/redacted_mounts (expected PASS)
and logged three fun things in dmesg - one was #12230, the second was:
[ 4230.618699] ------------[ cut here ]------------
[ 4230.618704] Stack depot reached limit capacity
[ 4230.618723] WARNING: CPU: 1 PID: 2588 at lib/stackdepot.c:115 stack_depot_save+0x3e1/0x460
[ 4230.618732] Modules linked in: zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) virtio_net net_failover failover virtio_pci virtio_pci_modern_dev virtio virtio_ring
[ 4230.618765] CPU: 1 PID: 2588 Comm: zpool Tainted: P B O 5.15.12kasan1 #1
[ 4230.618768] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 4230.618771] RIP: 0010:stack_depot_save+0x3e1/0x460
[ 4230.618774] Code: 24 08 e9 98 fd ff ff 0f 0b e9 09 fe ff ff 80 3d a0 9b d9 02 00 75 15 48 c7 c7 e8 bd 9b ac c6 05 90 9b d9 02 01 e8 cf b1 85 01 <0f> 0b 48 c7 c7 6c 9a cd ad 4c 89 fe e8 0e ad 92 01 48 8b 7c 24 08
[ 4230.618777] RSP: 0018:ffff88810d2ad040 EFLAGS: 00010082
[ 4230.618781] RAX: 0000000000000000 RBX: 00000000323c01a9 RCX: 0000000000000000
[ 4230.618783] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed1021a559fa
[ 4230.618785] RBP: 000000000000002f R08: 0000000000000001 R09: ffffed10a5a8ce90
[ 4230.618787] R10: ffff88852d46747b R11: ffffed10a5a8ce8f R12: ffff88810d2ad090
[ 4230.618789] R13: 0000000000000000 R14: ffff888529e00d48 R15: 0000000000000246
[ 4230.618790] FS: 00007f82acd7c7c0(0000) GS:ffff88852d440000(0000) knlGS:0000000000000000
[ 4230.618800] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4230.618803] CR2: 0000560b78e8f7b8 CR3: 000000015a790000 CR4: 00000000000506e0
[ 4230.618805] Call Trace:
[ 4230.618808] <TASK>
[ 4230.618810] ? arc_hdr_destroy+0x426/0xbc0 [zfs]
[ 4230.618811] ? spl_kmem_cache_free+0x260/0x7c0 [spl]
[ 4230.618811] kasan_save_stack+0x32/0x40
[ 4230.618811] ? kasan_save_stack+0x1b/0x40
[ 4230.618811] ? kasan_set_track+0x1c/0x30
[ 4230.618811] ? kasan_set_free_info+0x20/0x30
[ 4230.618811] ? __kasan_slab_free+0xea/0x120
[ 4230.618811] ? kmem_cache_free+0x74/0x270
[ 4230.618811] ? spl_kmem_cache_free+0x260/0x7c0 [spl]
[ 4230.618811] ? arc_hdr_destroy+0x4fe/0xbc0 [zfs]
[ 4230.618811] ? dbuf_destroy+0xd4/0x15d0 [zfs]
[ 4230.618811] ? dbuf_rele_and_unlock+0x5c1/0x12a0 [zfs]
[ 4230.618811] ? zap_lookup_norm+0xe3/0x120 [zfs]
[ 4230.618811] ? zap_lookup+0xd/0x20 [zfs]
[ 4230.618811] ? dsl_prop_get_dd+0x236/0x4c0 [zfs]
[ 4230.618811] ? dsl_prop_get_ds+0x371/0x530 [zfs]
[ 4230.618811] ? dsl_prop_register+0xe2/0xcc0 [zfs]
[ 4230.618811] ? dmu_objset_open_impl+0x778/0x23b0 [zfs]
[ 4230.618811] ? dmu_objset_from_ds+0x272/0x620 [zfs]
[ 4230.618811] ? dmu_objset_hold_flags+0xfb/0x130 [zfs]
[ 4230.618811] ? dsl_prop_get+0x7c/0xf0 [zfs]
[ 4230.618811] ? zvol_create_minors_cb+0xaa/0x3d0 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x1e4/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find+0x91/0xe0 [zfs]
[ 4230.618811] ? zvol_create_minors_recursive+0x3dc/0x600 [zfs]
[ 4230.618811] ? spa_import+0xbc3/0xfe0 [zfs]
[ 4230.618811] ? zfs_ioc_pool_import+0x30e/0x3b0 [zfs]
[ 4230.618811] ? zfsdev_ioctl_common+0xa71/0x1710 [zfs]
[ 4230.618811] ? zfsdev_ioctl+0x4a/0xd0 [zfs]
[ 4230.618811] ? __x64_sys_ioctl+0x122/0x190
[ 4230.618811] ? do_syscall_64+0x3b/0x90
[ 4230.618811] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] kasan_set_track+0x1c/0x30
[ 4230.618811] kasan_set_free_info+0x20/0x30
[ 4230.618811] __kasan_slab_free+0xea/0x120
[ 4230.618811] ? spl_kmem_cache_free+0x260/0x7c0 [spl]
[ 4230.618811] kmem_cache_free+0x74/0x270
[ 4230.618811] ? arc_write+0x1930/0x1930 [zfs]
[ 4230.618811] spl_kmem_cache_free+0x260/0x7c0 [spl]
[ 4230.618811] arc_hdr_destroy+0x4fe/0xbc0 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] dbuf_destroy+0xd4/0x15d0 [zfs]
[ 4230.618811] dbuf_rele_and_unlock+0x5c1/0x12a0 [zfs]
[ 4230.618811] ? kasan_unpoison+0x23/0x50
[ 4230.618811] ? zap_match+0x1b0/0x1b0 [zfs]
[ 4230.618811] ? dbuf_create_bonus+0x160/0x160 [zfs]
[ 4230.618811] ? __kasan_kmalloc+0x7c/0x90
[ 4230.618811] ? mutex_lock+0x89/0xd0
[ 4230.618811] ? __mutex_lock_slowpath+0x10/0x10
[ 4230.618811] ? kfree+0x8b/0x220
[ 4230.618811] zap_lookup_norm+0xe3/0x120 [zfs]
[ 4230.618811] ? zap_count+0x1a0/0x1a0 [zfs]
[ 4230.618811] ? zprop_name_to_prop+0x82/0xd0 [zcommon]
[ 4230.618811] zap_lookup+0xd/0x20 [zfs]
[ 4230.618811] dsl_prop_get_dd+0x236/0x4c0 [zfs]
[ 4230.618811] dsl_prop_get_ds+0x371/0x530 [zfs]
[ 4230.618811] ? rrw_held+0xcc/0x1c0 [zfs]
[ 4230.618811] dsl_prop_register+0xe2/0xcc0 [zfs]
[ 4230.618811] ? secondary_cache_changed_cb+0x80/0x80 [zfs]
[ 4230.618811] ? kasan_unpoison+0x23/0x50
[ 4230.618811] ? dsl_prop_get_int_ds+0x20/0x20 [zfs]
[ 4230.618811] ? spa_feature_decr+0x10/0x10 [zfs]
[ 4230.618811] dmu_objset_open_impl+0x778/0x23b0 [zfs]
[ 4230.618811] ? dmu_objset_sync_done+0x4f0/0x4f0 [zfs]
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? rrw_enter_read_impl+0x290/0x460 [zfs]
[ 4230.618811] dmu_objset_from_ds+0x272/0x620 [zfs]
[ 4230.618811] ? dsl_pool_hold+0xcb/0xf0 [zfs]
[ 4230.618811] ? dmu_objset_open_impl+0x23b0/0x23b0 [zfs]
[ 4230.618811] ? dsl_pool_user_release+0x10/0x10 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] dmu_objset_hold_flags+0xfb/0x130 [zfs]
[ 4230.618811] ? dmu_objset_from_ds+0x620/0x620 [zfs]
[ 4230.618811] ? zvol_create_minors_recursive+0x3dc/0x600 [zfs]
[ 4230.618811] ? zfs_ioc_pool_import+0x30e/0x3b0 [zfs]
[ 4230.618811] ? zfsdev_ioctl_common+0xa71/0x1710 [zfs]
[ 4230.618811] ? zfsdev_ioctl+0x4a/0xd0 [zfs]
[ 4230.618811] ? __x64_sys_ioctl+0x122/0x190
[ 4230.618811] ? do_syscall_64+0x3b/0x90
[ 4230.618811] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4230.618811] ? kfree+0x8b/0x220
[ 4230.618811] ? tsd_hash_dtor+0x14a/0x220 [spl]
[ 4230.618811] dsl_prop_get+0x7c/0xf0 [zfs]
[ 4230.618811] ? dsl_prop_register+0xcc0/0xcc0 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? dbuf_create_bonus+0x160/0x160 [zfs]
[ 4230.618811] zvol_create_minors_cb+0xaa/0x3d0 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x1e4/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? _raw_read_lock_irq+0x30/0x30
[ 4230.618811] ? mutex_unlock+0x7b/0xd0
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[ 4230.618811] ? rrw_exit+0x155/0x510 [zfs]
[ 4230.618811] dmu_objset_find_impl+0x3ed/0x820 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] ? dmu_objset_stats+0x240/0x240 [zfs]
[ 4230.618811] ? zfs_refcount_add_many+0x4d/0x350 [zfs]
[ 4230.618811] ? spa_open_common+0x5f5/0xa60 [zfs]
[ 4230.618811] ? spa_load_best+0x850/0x850 [zfs]
[ 4230.618811] ? zvol_add_clones+0x690/0x690 [zfs]
[ 4230.618811] dmu_objset_find+0x91/0xe0 [zfs]
[ 4230.618811] ? wake_up_q+0xa0/0x110
[ 4230.618811] ? dmu_objset_find_dp_cb+0x60/0x60 [zfs]
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x1b0/0x2f0
[ 4230.618811] zvol_create_minors_recursive+0x3dc/0x600 [zfs]
[ 4230.618811] ? zvol_last_close+0x190/0x190 [zfs]
[ 4230.618811] ? kasan_unpoison+0x23/0x50
[ 4230.618811] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[ 4230.618811] spa_import+0xbc3/0xfe0 [zfs]
[ 4230.618811] ? nvlist_common.part.106+0x149/0x570 [znvpair]
[ 4230.618811] ? spa_create+0x1b30/0x1b30 [zfs]
[ 4230.618811] ? nvlist_exists+0xd0/0xd0 [znvpair]
[ 4230.618811] ? free_unref_page_commit.isra.0+0x233/0x540
[ 4230.618811] ? drain_pages+0x80/0x80
[ 4230.618811] ? free_pcp_prepare+0x8a/0x450
[ 4230.618811] ? free_unref_page+0xa2/0xe0
[ 4230.618811] ? get_nvlist+0xd8/0x1b0 [zfs]
[ 4230.618811] ? memmove+0x39/0x60
[ 4230.618811] zfs_ioc_pool_import+0x30e/0x3b0 [zfs]
[ 4230.618811] ? zfs_ioc_clear+0x690/0x690 [zfs]
[ 4230.618811] ? kasan_unpoison+0x23/0x50
[ 4230.618811] ? __kasan_slab_alloc+0x2c/0x80
[ 4230.618811] ? memcpy+0x39/0x60
[ 4230.618811] ? strlcpy+0x8f/0x110
[ 4230.618811] zfsdev_ioctl_common+0xa71/0x1710 [zfs]
[ 4230.618811] ? __alloc_pages_slowpath.constprop.0+0x1e40/0x1e40
[ 4230.618811] ? mmu_notifier_range_update_to_read_only+0x4a/0xa0
[ 4230.618811] ? zfsdev_state_destroy+0x1b0/0x1b0 [zfs]
[ 4230.618811] ? __kasan_kmalloc_large+0x81/0xa0
[ 4230.618811] ? __kmalloc_node+0x206/0x2b0
[ 4230.618811] ? kvmalloc_node+0x4d/0x90
[ 4230.618811] zfsdev_ioctl+0x4a/0xd0 [zfs]
[ 4230.618811] __x64_sys_ioctl+0x122/0x190
[ 4230.618811] do_syscall_64+0x3b/0x90
[ 4230.618811] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 4230.618811] RIP: 0033:0x7f82ab3ac317
[ 4230.618811] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[ 4230.618811] RSP: 002b:00007ffcc9c34c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 4230.618811] RAX: ffffffffffffffda RBX: 00007ffcc9c34d00 RCX: 00007f82ab3ac317
[ 4230.618811] RDX: 00007ffcc9c34d00 RSI: 0000000000005a02 RDI: 0000000000000003
[ 4230.618811] RBP: 00007ffcc9c38bf0 R08: 00005573f7ea5130 R09: 0000000000000000
[ 4230.618811] R10: 00005573f7e7d010 R11: 0000000000000246 R12: 00005573f7e7d2e0
[ 4230.618811] R13: 00005573f7e8e548 R14: 0000000000000000 R15: 0000000000000000
[ 4230.618811] </TASK>
[ 4230.618811] ---[ end trace 25880a7254006869 ]---
(whew, that was long, and I might have repeated a line or two that occurred 5+ times in a row)
And the final one:
[12458.481675] ------------[ cut here ]------------
[12458.481679] WARNING: CPU: 0 PID: 27863 at fs/read_write.c:525 __kernel_write+0x765/0x9e0
[12458.481688] Modules linked in: zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) virtio_net net_failover failover virtio_pci virtio_pci_modern_dev virtio virtio_ring
[12458.481718] CPU: 0 PID: 27863 Comm: python3.6 Tainted: P B W O 5.15.12kasan1 #1
[12458.481721] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[12458.481724] RIP: 0010:__kernel_write+0x765/0x9e0
[12458.481728] Code: fe ff ff 48 c7 c6 60 4e 4b ac 48 c7 c7 40 80 e5 ac e8 2f 07 7c 00 85 c0 0f 85 4b 35 01 02 49 c7 c6 ea ff ff ff e9 ee fe ff ff <0f> 0b 49 c7 c6 f7 ff ff ff e9 e0 fe ff ff 48 b8 00 00 00 00 00 fc
[12458.481730] RSP: 0018:ffff88835b2ef000 EFLAGS: 00010246
[12458.481735] RAX: 00000000480a801d RBX: ffff88810ee13000 RCX: dffffc0000000000
[12458.481737] RDX: 0000000000000000 RSI: ffff88838eb6d800 RDI: ffff88823e666cc4
[12458.481739] RBP: 1ffff1106b65de03 R08: 0000000000000138 R09: ffffffffad986048
[12458.481741] R10: dffffc0000000000 R11: ffffed106b65ddc7 R12: ffff88823e666c80
[12458.481743] R13: ffff88835b2ef1c0 R14: ffff88835b2ef1c0 R15: 0000000000000138
[12458.481746] FS: 00007f91e8b5a740(0000) GS:ffff88852d400000(0000) knlGS:0000000000000000
[12458.481750] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12458.481752] CR2: 00000000017fe908 CR3: 000000014aba6000 CR4: 00000000000506f0
[12458.481754] Call Trace:
[12458.481756] <TASK>
[12458.481758] ? kasan_save_stack+0x32/0x40
[12458.481763] ? do_iter_readv_writev+0x6f0/0x6f0
[12458.481766] ? __kasan_slab_free+0xea/0x120
[12458.481769] ? dmu_send+0x618/0xbb0 [zfs]
[12458.481906] ? zfs_ioc_send_new+0x22c/0x2c0 [zfs]
[12458.481965] ? zfsdev_ioctl_common+0xebe/0x1710 [zfs]
[12458.481996] ? zfsdev_ioctl+0x4a/0xd0 [zfs]
[12458.482027] ? __x64_sys_ioctl+0x122/0x190
[12458.482032] ? do_syscall_64+0x3b/0x90
[12458.482036] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[12458.482040] ? __cond_resched+0x10/0x20
[12458.482043] ? __inode_security_revalidate+0x98/0xc0
[12458.482048] ? selinux_file_permission+0x32d/0x410
[12458.482052] ? security_file_permission+0x4e/0x580
[12458.482056] kernel_write+0x9f/0x2f0
[12458.482061] zfs_file_write+0x94/0x170 [zfs]
[12458.482092] ? zfs_file_close+0x10/0x10 [zfs]
[12458.482119] dump_record+0x1ff/0x8f0 [zfs]
[12458.482152] dmu_send_impl+0x12bd/0x3ca0 [zfs]
[12458.482183] ? zfs_refcount_remove_many+0x5ad/0x940 [zfs]
[12458.482220] ? do_dump+0x28e0/0x28e0 [zfs]
[12458.482251] ? dbuf_rele_and_unlock+0x6c9/0x12a0 [zfs]
[12458.482282] ? dbuf_create_bonus+0x160/0x160 [zfs]
[12458.482312] ? __mutex_lock_slowpath+0x10/0x10
[12458.482315] ? zfs_refcount_count+0x16/0x40 [zfs]
[12458.482348] ? dsl_dataset_hold_flags+0x2e5/0x630 [zfs]
[12458.482382] ? dsl_dataset_hold_obj_flags+0x120/0x120 [zfs]
[12458.482420] ? mutex_unlock+0x7b/0xd0
[12458.482424] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[12458.482427] ? __kasan_kmalloc+0x7c/0x90
[12458.482430] ? zfs_refcount_add_many+0x4d/0x350 [zfs]
[12458.482463] ? create_prof_cpu_mask+0x20/0x20
[12458.482467] ? arch_stack_walk+0x99/0xf0
[12458.482471] dmu_send+0x618/0xbb0 [zfs]
[12458.482503] ? dmu_send_obj+0x570/0x570 [zfs]
[12458.482533] ? stack_trace_consume_entry+0x160/0x160
[12458.482537] ? unwind_next_frame+0x11a1/0x17e0
[12458.482543] ? stack_trace_consume_entry+0x160/0x160
[12458.482546] ? stack_trace_save+0x8c/0xc0
[12458.482549] ? kasan_save_stack+0x32/0x40
[12458.482552] ? kasan_save_stack+0x1b/0x40
[12458.482556] ? __kasan_kmalloc+0x7c/0x90
[12458.482559] ? spl_kmem_alloc_impl+0x11f/0x160 [spl]
[12458.482564] ? nv_mem_zalloc.isra.12+0x4e/0x80 [znvpair]
[12458.482570] ? nvlist_xalloc.part.13+0xd8/0x340 [znvpair]
[12458.482574] ? fnvlist_alloc+0x61/0xc0 [znvpair]
[12458.482579] ? zfsdev_ioctl_common+0xddd/0x1710 [zfs]
[12458.482613] ? zfsdev_ioctl+0x4a/0xd0 [zfs]
[12458.482644] ? nvt_lookup_name_type.isra.54+0x15b/0x420 [znvpair]
[12458.482649] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[12458.482652] ? memmove+0x39/0x60
[12458.482655] ? nvpair_value_common.part.20+0x235/0x3b0 [znvpair]
[12458.482660] zfs_ioc_send_new+0x22c/0x2c0 [zfs]
[12458.482692] ? zfs_ioc_send_space+0x770/0x770 [zfs]
[12458.482722] ? __mutex_unlock_slowpath.isra.0+0x2f0/0x2f0
[12458.482726] ? kasan_unpoison+0x23/0x50
[12458.482729] ? __kasan_slab_alloc+0x2c/0x80
[12458.482732] ? __kasan_kmalloc+0x7c/0x90
[12458.482735] ? memset+0x20/0x40
[12458.482737] ? nv_mem_zalloc.isra.12+0x63/0x80 [znvpair]
[12458.482741] ? nvlist_xalloc.part.13+0xd8/0x340 [znvpair]
[12458.482746] ? zfs_ioc_send+0x6a0/0x6a0 [zfs]
[12458.482776] ? nvlist_lookup_nvpair_embedded_index+0x20/0x20 [znvpair]
[12458.482781] ? memcpy+0x39/0x60
[12458.482784] zfsdev_ioctl_common+0xebe/0x1710 [zfs]
[12458.482882] ? zfsdev_state_destroy+0x1b0/0x1b0 [zfs]
[12458.482913] ? __kasan_kmalloc_large+0x81/0xa0
[12458.482917] ? __kmalloc_node+0x206/0x2b0
[12458.482921] ? kvmalloc_node+0x4d/0x90
[12458.482925] zfsdev_ioctl+0x4a/0xd0 [zfs]
[12458.482956] __x64_sys_ioctl+0x122/0x190
[12458.482959] do_syscall_64+0x3b/0x90
[12458.482963] entry_SYSCALL_64_after_hwframe+0x44/0xae
[12458.482967] RIP: 0033:0x7f91e8668317
[12458.482971] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
[12458.482974] RSP: 002b:00007ffeecd13348 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[12458.482979] RAX: ffffffffffffffda RBX: 0000000000005a40 RCX: 00007f91e8668317
[12458.482982] RDX: 00007ffeecd13370 RSI: 0000000000005a40 RDI: 0000000000000004
[12458.482985] RBP: 00007ffeecd16960 R08: 0000000000000020 R09: 00000000017f8590
[12458.482987] R10: 0000000500000001 R11: 0000000000000246 R12: 00007ffeecd13370
[12458.482989] R13: 0000000000000000 R14: 0000000000005a40 R15: 00000000017f8590
[12458.482992] </TASK>
[12458.482995] ---[ end trace 25880a725400686a ]---
I can go find out which tests the latter two happened during if they're hard to repro for anyone.
Some of the tests failed because I forgot to build scsi-debug into the kernel config. Whoops.
It seems that I spoke too soon in https://github.com/openzfs/zfs/pull/12928#issuecomment-1007496550, because it got to Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table]
and panicked because it smashed its stack(?):
[ 1323.717046] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1323.719230] CPU: 2 PID: 94177 Comm: txg_sync Tainted: P B OE 5.15.0-2-amd64 #1 Debian 5.11
[ 1323.721843] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1323.724027] Call Trace:
[ 1323.724703] <TASK>
[ 1323.725308] dump_stack_lvl+0x46/0x5a
[ 1323.726325] panic+0x18b/0x389
[ 1323.727146] ? __warn_printk+0xf3/0xf3
[ 1323.728141] ? kasan_save_stack+0x32/0x40
[ 1323.729247] ? kasan_save_stack+0x1b/0x40
[ 1323.730344] ? __schedule+0xca/0xf90
[ 1323.731312] ? schedule+0x30/0x120
[ 1323.732275] __schedule+0xf8b/0xf90
[ 1323.734046] ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0
[ 1323.735838] ? io_schedule_timeout+0xb0/0xb0
[ 1323.737164] ? llist_add_batch+0x33/0x50
[ 1323.738928] schedule+0x6d/0x120
[ 1323.739864] schedule_timeout+0xe4/0x1f0
[ 1323.740958] ? usleep_range+0xe0/0xe0
[ 1323.742761] ? try_to_wake_up+0x392/0x910
[ 1323.743880] ? __bpf_trace_tick_stop+0xe0/0xe0
[ 1323.745165] ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 1323.747169] ? __native_queued_spin_unlock+0x9/0x10
[ 1323.748482] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[ 1323.750805] __cv_timedwait_common+0x19e/0x2b0 [spl]
[ 1323.752477] ? __cv_wait_idle+0xd0/0xd0 [spl]
[ 1323.754185] ? recalc_sigpending+0x5a/0x70
[ 1323.755540] ? finish_wait+0x100/0x100
[ 1323.756554] ? mutex_unlock+0x80/0xd0
[ 1323.757855] ? bpobj_space+0x10c/0x120 [zfs]
[ 1323.761056] __cv_timedwait_idle+0x9a/0xe0 [spl]
[ 1323.762792] ? __cv_timedwait_sig+0x70/0x70 [spl]
[ 1323.764102] ? __bitmap_weight+0x71/0x90
Test: /usr/local/share/zfs/zfs-tests/[te st1s/3fu2nc3t.765322] txg_sync_thread+0x24f/0x760 [zfs]
[ 1323.768519] ? kasan_set_track+0x1c/0x30
ional/channel_program/lua_core/tst.stack_gs[ub (1r323.770070] ? txg_fini+0x300/0x300 [zfs]
un a[s ro1ot3) 2[030:.772767] thread_generic_wrapper+0xa8/0xc0 [spl]
30] [[PA SS1]
23.774855] ? __thread_exit+0x20/0x20 [spl]
[ 1323.776636] kthread+0x1d2/0x200
[ 1323.777992] ? set_kthread_struct+0x80/0x80
[ 1323.779343] ret_from_fork+0x22/0x30
[ 1323.780339] </TASK>
[ 1323.781259] Kernel Offset: 0x33200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x)
[ 1323.784663] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---
(The output mingling is as original from the console.) This seems to point to lua, which is as-expected (#12230), but reading through that it doesn't look like the kernel out-right panicked in that run?
Here's the results (though, well, it panicked, so): zts-results.ecYPqF.gz
I did have it panic once and say the stack was destroyed, though I didn't get a trace from why, when I gave it much less RAM than I thought I had; increasing it made it just complain.
That's with -m 48g
(half host memory) and -device virtio-balloon
(the efficacy of which I don't know how to ascertain; QEMU has 12.3G RES and started up near-instantly, so I think it's working? but dunno for sure), which, well, should be enough, right?
Yeah, no kidding - I was using n=4 and 24 GB.
Happened again (I filtered by -T functional
like you said you had in hopes of avoiding this, but no luck):
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table]
[ 1412.446099] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1412.449285] CPU: 3 PID: 92945 Comm: txg_sync Tainted: P B OE 5.15.0-2-amd64 #1 Debian 5.11
[ 1412.453096] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1412.456292] Call Trace:
[ 1412.457286] <TASK>
[ 1412.458131] dump_stack_lvl+0x46/0x5a
[ 1412.459583] panic+0x18b/0x389
[ 1412.460794] ? __warn_printk+0xf3/0xf3
[ 1412.462401] ? __schedule+0xca/0xf90
[ 1412.463906] ? schedule+0x30/0x120
[ 1412.465427] __schedule+0xf8b/0xf90
[ 1412.466923] ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0
[ 1412.469600] ? io_schedule_timeout+0xb0/0xb0
[ 1412.471564] ? x2apic_send_IPI+0x60/0x70
[ 1412.473266] schedule+0x6d/0x120
[ 1412.474715] schedule_timeout+0xe4/0x1f0
[ 1412.476451] ? usleep_range+0xe0/0xe0
[ 1412.478161] ? try_to_wake_up+0x392/0x910
[ 1412.479850] ? __bpf_trace_tick_stop+0xe0/0xe0
[ 1412.481714] ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 1412.484069] ? __native_queued_spin_unlock+0x9/0x10
[ 1412.486105] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[ 1412.489007] __cv_timedwait_common+0x19e/0x2b0 [spl]
[ 1412.491329] ? __cv_wait_idle+0xd0/0xd0 [spl]
[ 1412.493425] ? recalc_sigpending+0x5a/0x70
[ 1412.495169] ? finish_wait+0x100/0x100
[ 1412.496692] ? mutex_unlock+0x80/0xd0
[ 1412.498196] ? bpobj_space+0x10c/0x120 [zfs]
[ 1412.501370] __cv_timedwait_idle+0x9a/0xe0 [spl]
Test[: /u1sr4/l1oc2al.503351] ? __cv_timedwait_sig+0x70/0x70 [spl]
[ 1412.505670] ? __bitmap_weight+0x71/0x90
/share/zfs/zfs-tests/tests/function[a l/141c2ha.507n2ne7l_6pr] txg_sync_thread+0x24f/0x760 [zfs]
og[ra m/1lu4a_1co2re./t510602] ? kasan_set_track+0x1c/0x30
st.stack_gsub (run as root) [00:00] [PASS]
[ 1412.512569] ? txg_fini+0x300/0x300 [zfs]
[ 1412.515557] thread_generic_wrapper+0xa8/0xc0 [spl]
[ 1412.517615] ? __thread_exit+0x20/0x20 [spl]
[ 1412.519540] kthread+0x1d2/0x200
[ 1412.521014] ? set_kthread_struct+0x80/0x80
[ 1412.522881] ret_from_fork+0x22/0x30
[ 1412.524383] </TASK>
[ 1412.525803] Kernel Offset: 0x6800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xf)
[ 1412.530304] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---
Results: zts-results.8XEhF3.gz (again, from the guest, which panicked; I guess I could network mount this, but that sounds like an amazing way to triple the run-time)
This is my qemu cmdline (line-broken for your viewing pleasure; the smp configuration mimicks the host, except the host is (a) NUMA, obviously, and (b) has twice as many cores/socket):
qemu-system-x86_64 -enable-kvm -smp sockets=2,cores=3,threads=2 -m 48g -nographic -vga none \
-nic user,model=virtio,hostfwd=tcp::2222-:22 \
-drive file=/dev/zvol/filling/store/nabijaczleweli/vm-kasan-test-root,if=none,id=root,format=raw,cache=none -device virtio-blk-pci,drive=root \
-drive file=/dev/zvol/filling/store/nabijaczleweli/vm-kasan-test-scratch,if=none,id=scratch,format=raw,cache=none -device virtio-blk-pci,drive=scratch \
-device virtio-balloon \
-kernel kasan-test/vmlinuz-5.15.0-2-amd64 -initrd kasan-test/initrd.img-5.15.0-2-amd64 -append 'console=ttyS0 root=/dev/vda'
A pickle indeed. Maybe unballooning will help? (I doubt it from the trace, but it'd be fun. Otherwise I have no clue, since, well.)
Novel.
Last time I used the ballooning driver, it was with Xen 3, so I have no constructive input there.
Here is the .config I used with my kASAN kernel, if you'd like to compare it to yours.
Disabling the balloon seems to have no effect:
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table]
[ 1255.851278] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1255.853384] CPU: 2 PID: 95047 Comm: txg_sync Tainted: P B OE 5.15.0-2-amd64 #1 Debian 5.11
[ 1255.855991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1255.858192] Call Trace:
[ 1255.858865] <TASK>
[ 1255.859455] dump_stack_lvl+0x46/0x5a
[ 1255.860457] panic+0x18b/0x389
[ 1255.861282] ? __warn_printk+0xf3/0xf3
[ 1255.862837] ? __schedule+0xca/0xf90
[ 1255.864394] ? schedule+0x30/0x120
[ 1255.865771] __schedule+0xf8b/0xf90
[ 1255.867219] ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0
[ 1255.869614] ? io_schedule_timeout+0xb0/0xb0
[ 1255.871356] ? x2apic_send_IPI+0x60/0x70
[ 1255.873033] schedule+0x6d/0x120
[ 1255.874413] schedule_timeout+0xe4/0x1f0
[ 1255.876030] ? usleep_range+0xe0/0xe0
[ 1255.877508] ? try_to_wake_up+0x392/0x910
[ 1255.879227] ? __bpf_trace_tick_stop+0xe0/0xe0
[ 1255.881019] ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 1255.883442] ? __native_queued_spin_unlock+0x9/0x10
[ 1255.885475] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[ 1255.888179] __cv_timedwait_common+0x19e/0x2b0 [spl]
[ 1255.890329] ? __cv_wait_idle+0xd0/0xd0 [spl]
[ 1255.892265] ? recalc_sigpending+0x5a/0x70
[ 1255.893919] ? finish_wait+0x100/0x100
[ 1255.895497] ? mutex_unlock+0x80/0xd0
[ 1255.896864] ? bpobj_space+0x10c/0x120 [zfs]
[ 1255.900311] __cv_timedwait_idle+0x9a/0xe0 [spl]
[ 1255.902165] ? __cv_timedwait_sig+0x70/0x70 [spl]
[ 1255.903998] ? __bitmap_weight+0x71/0x90
[ 1255.905528] txg_sync_thread+0x24f/0x760 [zfs]
[ 1255.908229] ? kasan_set_track+0x1c/0x30
[ 1255.910077] ? txg_fini+0x300/0x300 [zfs]
[ 1255.913039] thread_generic_wrapper+0xa8/0xc0 [spl]
[ 1255.914773] ? __thread_exit+0x20/0x20 [spl]
[ 1255.916410] kthread+0x1d2/0x200
[ 1255.917541] ? set_kthread_struct+0x80/0x80
[ 1255.919008] ret_from_fork+0x22/0x30
[ 1255.920208] </TASK>
[ 1255.921244] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x)
[ 1255.924842] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---
(It also hasn't changed QEMU's memory usage, so.); zts-results.lD6EIq.gz
In what is an ultimate basic bitch move, I just built the debian kernel packages but added CONFIG_KASAN=y
where the original had "CONFIG_KASAN is unset", and installed them on a fresh sid strap: config-5.15.0-2-amd64.gz; I can upload the send of the image later, if there's interest.
Rudimentary analysis (git diff
) reveals that they're almost entirely unrelated; grepping for KASAN shows this (-your kasan, +my debian):
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
-# CONFIG_KASAN_OUTLINE is not set
-CONFIG_KASAN_INLINE=y
+CONFIG_KASAN_OUTLINE=y
+# CONFIG_KASAN_INLINE is not set
CONFIG_KASAN_STACK=y
# CONFIG_KASAN_VMALLOC is not set
# CONFIG_KASAN_MODULE_TEST is not set
I assume OUTLINE is the default, since I changed no other lines in the seed config. (It also seems prudent to note that I know jack squat about how these things would interact => not a clue what this realistically means.)
Yeah, mine was edited make defconfig, so unsurprising it didn't have much in common.
INLINE means, AIUI, what it says on the tin for KASAN - is it making actual calls for the kasan shims around everything, or is it inlining them and laughing at the bloat that ensues?
I could imagine actual calls everywhere would make a significant difference...
On Fri, Jan 7, 2022 at 12:05 PM наб @.***> wrote:
Disabling the balloon seems to have no effect:
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table] [ 1255.851278] Kernel panic - not syncing: corrupted stack end detected inside scheduler [ 1255.853384] CPU: 2 PID: 95047 Comm: txg_sync Tainted: P B OE 5.15.0-2-amd64 #1 Debian 5.11 [ 1255.855991] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 1255.858192] Call Trace: [ 1255.858865]
[ 1255.859455] dump_stack_lvl+0x46/0x5a [ 1255.860457] panic+0x18b/0x389 [ 1255.861282] ? warn_printk+0xf3/0xf3 [ 1255.862837] ? schedule+0xca/0xf90 [ 1255.864394] ? schedule+0x30/0x120 [ 1255.865771] schedule+0xf8b/0xf90 [ 1255.867219] ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0 [ 1255.869614] ? io_schedule_timeout+0xb0/0xb0 [ 1255.871356] ? x2apic_send_IPI+0x60/0x70 [ 1255.873033] schedule+0x6d/0x120 [ 1255.874413] schedule_timeout+0xe4/0x1f0 [ 1255.876030] ? usleep_range+0xe0/0xe0 [ 1255.877508] ? try_to_wake_up+0x392/0x910 [ 1255.879227] ? __bpf_trace_tick_stop+0xe0/0xe0 [ 1255.881019] ? mutex_unlock_slowpath.constprop.0+0x210/0x210 [ 1255.883442] ? native_queued_spin_unlock+0x9/0x10 [ 1255.885475] ? __raw_calleesavenative_queued_spin_unlock+0x11/0x1e [ 1255.888179] cv_timedwait_common+0x19e/0x2b0 [spl] [ 1255.890329] ? cv_wait_idle+0xd0/0xd0 [spl] [ 1255.892265] ? recalc_sigpending+0x5a/0x70 [ 1255.893919] ? finish_wait+0x100/0x100 [ 1255.895497] ? mutex_unlock+0x80/0xd0 [ 1255.896864] ? bpobj_space+0x10c/0x120 [zfs] [ 1255.900311] cv_timedwait_idle+0x9a/0xe0 [spl] [ 1255.902165] ? cv_timedwait_sig+0x70/0x70 [spl] [ 1255.903998] ? bitmap_weight+0x71/0x90 [ 1255.905528] txg_sync_thread+0x24f/0x760 [zfs] [ 1255.908229] ? kasan_set_track+0x1c/0x30 [ 1255.910077] ? txg_fini+0x300/0x300 [zfs] [ 1255.913039] thread_generic_wrapper+0xa8/0xc0 [spl] [ 1255.914773] ? thread_exit+0x20/0x20 [spl] [ 1255.916410] kthread+0x1d2/0x200 [ 1255.917541] ? set_kthread_struct+0x80/0x80 [ 1255.919008] ret_from_fork+0x22/0x30 [ 1255.920208] [ 1255.921244] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0x) [ 1255.924842] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---(It also hasn't changed QEMU's memory usage, so.); zts-results.lD6EIq.gz https://github.com/openzfs/zfs/files/7830180/zts-results.lD6EIq.gz
In what is an ultimate basic bitch move, I just built the debian kernel packages but added CONFIG_KASAN=y where the original had "CONFIG_KASAN is unset", and installed them on a fresh sid strap: config-5.15.0-2-amd64.gz https://github.com/openzfs/zfs/files/7830184/config-5.15.0-2-amd64.gz; I can upload the send of the image later, if there's interest.
Rudimentary analysis (git diff) reveals that they're almost entirely unrelated; grepping for KASAN shows this (-your kasan, +my debian):
CONFIG_KASAN=y CONFIG_KASAN_GENERIC=y-# CONFIG_KASAN_OUTLINE is not set-CONFIG_KASAN_INLINE=y+CONFIG_KASAN_OUTLINE=y+# CONFIG_KASAN_INLINE is not set CONFIG_KASAN_STACK=y
CONFIG_KASAN_VMALLOC is not set
CONFIG_KASAN_MODULE_TEST is not set
I assume OUTLINE is the default, since I changed no other lines in the seed config. (It also seems prudent to note that I know jack squat about how these things would interact => not a clue what this realistically means.)
— Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/12226#issuecomment-1007575586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABUI7MRNWDU5TLYYAQ5FUTUU4MN3ANCNFSM46Q2YCXQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you were mentioned.Message ID: @.***>
Hm; quoth lib/Kconfig.kasan:
choice
prompt "Instrumentation type"
depends on KASAN_GENERIC || KASAN_SW_TAGS
default KASAN_OUTLINE
config KASAN_OUTLINE
bool "Outline instrumentation"
help
Before every memory access compiler insert function call
__asan_load*/__asan_store*. These functions performs check
of shadow memory. This is slower than inline instrumentation,
however it doesn't bloat size of kernel's .text section so
much as inline does.
config KASAN_INLINE
bool "Inline instrumentation"
depends on !ARCH_DISABLE_KASAN_INLINE
help
Compiler directly inserts code checking shadow memory before
memory accesses. This is faster than outline (in some workloads
it gives about x2 boost over outline instrumentation), but
make kernel's .text size much bigger.
endchoice
So, yes, OUTLINE is actual calls, and INLINE doubles .text. Although I wouldn't expect that to make a difference?
Changed -smp sockets=2,cores=3,threads=2
to -smp 12
(i.e. the same amount of CPUs but a different topology), and I got this kasan warning:
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.args_to_lua (run as root) [00:00] [PASS]
[ 1256.320566] ==================================================================
[ 1256.322746] BUG: KASAN: stack-out-of-bounds in stack_trace_consume_entry+0x58/0x80
[ 1256.324973] Write of size 8 at addr ffff88830196f770 by task zfs/90929
[ 1256.326822]
[ 1256.327297] CPU: 1 PID: 90929 Comm: zfs Tainted: P OE 5.15.0-2-amd64 #1 Debian 5.15.5-2.1
[ 1256.329806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1256.332029] Call Trace:
[ 1256.332723] <TASK>
[ 1256.333310] dump_stack_lvl+0x46/0x5a
[ 1256.334335] print_address_description.constprop.0+0x1f/0x140
[ 1256.335895] ? stack_trace_consume_entry+0x58/0x80
[ 1256.337188] kasan_report.cold+0x83/0xdf
[ 1256.338256] ? stack_trace_consume_entry+0x58/0x80
[ 1256.339547] ? kasan_save_stack+0x1b/0x40
[ 1256.340583] stack_trace_consume_entry+0x58/0x80
[ 1256.341757] ? create_prof_cpu_mask+0x20/0x20
[ 1256.342881] arch_stack_walk+0x78/0xf0
[ 1256.343913] ? kfree+0xc5/0x280
[ 1256.344726] ? kasan_save_stack+0x1b/0x40
[ 1256.345761] ? kfree+0xc5/0x280
[ 1256.346573] stack_trace_save+0x91/0xc0
[ 1256.347581] ? stack_trace_consume_entry+0x80/0x80
[ 1256.348810] ? luaD_call+0x11f/0x200 [zlua]
[ 1256.349992] ? resume_cb+0x190/0x190 [zlua]
[ 1256.351122] kasan_save_stack+0x1b/0x40
[ 1256.352148] ? lua_setfield+0xb0/0xb0 [zlua]
[ 1256.353293] ? luaD_rawrunprotected+0x10a/0x160 [zlua]
[ 1256.354656] ? lua_setfield+0xb0/0xb0 [zlua]
[ 1256.355816] ? f_parser+0x190/0x190 [zlua]
[ 1256.356911] ? lua_setfield+0xb0/0xb0 [zlua]
[ 1256.358055] ? lua_setfield+0xb0/0xb0 [zlua]
[ 1256.359200] ? luaD_rawrunprotected+0xd2/0x160 [zlua]
[ 1256.360569] ? lua_setfield+0xb0/0xb0 [zlua]
[ 1256.361711] ? luaD_rawrunprotected+0xd2/0x160 [zlua]
[ 1256.363058] ? luaF_close+0x33/0x1b0 [zlua]
[ 1256.364193] ? luaD_pcall+0xa0/0x130 [zlua]
[ 1256.365321] ? lua_pcallk+0x10a/0x290 [zlua]
[ 1256.366461] kasan_set_track+0x1c/0x30
[ 1256.367445] kasan_set_free_info+0x20/0x30
[ 1256.368501] __kasan_slab_free+0xec/0x120
[ 1256.369529] slab_free_freelist_hook+0x66/0x130
[ 1256.370692] ? zcp_eval+0x4b4/0x9c0 [zfs]
[ 1256.373034] kfree+0xc5/0x280
[ 1256.373812] zcp_eval+0x4b4/0x9c0 [zfs]
[ 1256.375553] ? zcp_dataset_hold+0x150/0x150 [zfs]
[ 1256.377487] ? spl_kmem_alloc_impl+0xf6/0x110 [spl]
[ 1256.378841] ? nv_mem_zalloc.isra.0+0x33/0x60 [znvpair]
[ 1256.380294] ? nvlist_xalloc.part.0+0x86/0x140 [znvpair]
[ 1256.381699] ? zfsdev_ioctl_common+0x635/0xbd0 [zfs]
[ 1256.383710] ? zfsdev_ioctl+0x53/0xe0 [zfs]
[ 1256.385518] ? __x64_sys_ioctl+0xb9/0xf0
[ 1256.386540] ? do_syscall_64+0x3b/0xc0
[ 1256.387526] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1256.388867] ? nvt_remove_nvpair+0xde/0x1e0 [znvpair]
[ 1256.390211] ? nvpair_type_is_array+0x50/0x50 [znvpair]
[ 1256.391612] ? nvt_remove_nvpair+0x13f/0x1e0 [znvpair]
[ 1256.392984] ? nvt_lookup_name_type.isra.0+0xc8/0x110 [znvpair]
[ 1256.394552] ? fnvlist_lookup_nvpair+0x5f/0xc0 [znvpair]
[ 1256.395982] ? fnvlist_remove_nvpair+0x40/0x40 [znvpair]
[ 1256.397398] zfs_ioc_channel_program+0x169/0x200 [zfs]
[ 1256.399460] ? zfs_ioc_redact+0x180/0x180 [zfs]
[ 1256.401350] ? nvlist_xalloc.part.0+0xde/0x140 [znvpair]
[ 1256.402764] ? nvlist_lookup_nvpair_embedded_index+0x20/0x20 [znvpair]
[ 1256.404500] zfsdev_ioctl_common+0x69a/0xbd0 [zfs]
[ 1256.406461] ? zfsdev_state_destroy+0x70/0x70 [zfs]
[ 1256.408451] ? __kmalloc_node+0x435/0x4e0
[ 1256.409482] ? __virt_addr_valid+0xbe/0x130
[ 1256.410555] ? _copy_from_user+0x3a/0x70
[ 1256.411602] zfsdev_ioctl+0x53/0xe0 [zfs]
[ 1256.413371] __x64_sys_ioctl+0xb9/0xf0
[ 1256.414339] do_syscall_64+0x3b/0xc0
[ 1256.415294] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1256.416587] RIP: 0033:0x7fe099b92a97
[ 1256.417522] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 08
[ 1256.422212] RSP: 002b:00007ffec362a168 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1256.424131] RAX: ffffffffffffffda RBX: 00007fe096336700 RCX: 00007fe099b92a97
[ 1256.425937] RDX: 00007fe096333050 RSI: 0000000000005a48 RDI: 0000000000000004
[ 1256.427756] RBP: 00007ffec362a220 R08: 00007fe096637000 R09: 0000000000000000
[ 1256.429654] R10: 00007fe09b6a2710 R11: 0000000000000246 R12: 0000000000005a48
[ 1256.431560] R13: 00007fe096333050 R14: 00007fe096333030 R15: 0000000000000004
[ 1256.433438] </TASK>
[ 1256.434043]
[ 1256.434465] The buggy address belongs to the page:
[ 1256.435753] page:000000005b97f116 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x30196f
[ 1256.438234] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 1256.439987] raw: 0017ffffc0000000 0000000000000000 ffffea000c065bc8 0000000000000000
[ 1256.442049] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 1256.444121] page dumped because: kasan: bad access detected
[ 1256.445613] KASAN internal error: frame info validation failed; invalid marker: 18446612690182375432
[ 1256.448039]
[ 1256.448449] Memory state around the buggy address:
[ 1256.449735] ffff88830196f600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1256.451678] ffff88830196f680: 00 00 00 00 00 00 f1 f1 f1 f1 00 f3 f1 f1 f1 f1
[ 1256.453597] >ffff88830196f700: 00 00 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 f1 00
[ 1256.455537] ^
[ 1256.457361] ffff88830196f780: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1256.459291] ffff88830196f800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1256.461197] ==================================================================
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.divide_by_zero (run as root) [00:00] [PASS]
And then this panic:
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table (run as root) [00:00] [PASS]
[ 1307.478695] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1307.481210] CPU: 0 PID: 90683 Comm: txg_sync Tainted: P B OE 5.15.0-2-amd64 #1 Debian 5.15.5-2.1
[ 1307.484141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1307.486629] Call Trace:
[ 1307.487405] <TASK>
[ 1307.488057] dump_stack_lvl+0x46/0x5a
[ 1307.489247] panic+0x18b/0x389
[ 1307.490185] ? __warn_printk+0xf3/0xf3
[ 1307.491323] ? kasan_save_stack+0x32/0x40
[ 1307.492614] ? kasan_save_stack+0x1b/0x40
[ 1307.493844] ? __schedule+0xca/0xf90
[ 1307.494946] ? schedule+0x30/0x120
[ 1307.496155] __schedule+0xf8b/0xf90
[ 1307.497450] ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0
[ 1307.499518] ? io_schedule_timeout+0xb0/0xb0
[ 1307.501204] ? llist_add_batch+0x33/0x50
[ 1307.502411] schedule+0x6d/0x120
[ 1307.503395] schedule_timeout+0xe4/0x1f0
[ 1307.504578] ? usleep_range+0xe0/0xe0
[ 1307.505696] ? try_to_wake_up+0x392/0x910
[ 1307.507131] ? __bpf_trace_tick_stop+0xe0/0xe0
[ 1307.508886] ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 1307.510848] ? __native_queued_spin_unlock+0x9/0x10
[ 1307.512782] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[ 1307.515329] __cv_timedwait_common+0x19e/0x2b0 [spl]
[ 1307.517378] ? __cv_wait_idle+0xd0/0xd0 [spl]
[ 1307.519165] ? recalc_sigpending+0x5a/0x70
[ 1307.520781] ? finish_wait+0x100/0x100
[ 1307.522190] ? mutex_unlock+0x80/0xd0
[ 1307.523541] ? bpobj_space+0x10c/0x120 [zfs]
[ 1307.526085] __cv_timedwait_idle+0x9a/0xe0 [spl]
[ 1307.527845] ? __cv_timedwait_sig+0x70/0x70 [spl]
[ 1307.529535] ? __bitmap_weight+0x71/0x90
[ 1307.530987] txg_sync_thread+0x24f/0x760 [zfs]
[ 1307.533594] ? kasan_set_track+0x1c/0x30
[ 1307.534955] ? txg_fini+0x300/0x300 [zfs]
[ 1307.537273] thread_generic_wrapper+0xa8/0xc0 [spl]
[ 1307.539001] ? __thread_exit+0x20/0x20 [spl]
[ 1307.540517] kthread+0x1d2/0x200
[ 1307.541640] ? set_kthread_struct+0x80/0x80
[ 1307.543074] ret_from_fork+0x22/0x30
[ 1307.544305] </TASK>
[ 1307.545446] Kernel Offset: 0xee00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1307.548672] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---
I'm running with 64G now but bumping to that made it decide that it's going to run like absolute shit; nevertheless:
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.args_to_lua (run as root) [00:00] [PASS]
[ 1243.231345] ==================================================================
[ 1243.233300] BUG: KASAN: stack-out-of-bounds in stack_trace_consume_entry+0x58/0x80
[ 1243.235300] Write of size 8 at addr ffff88811a437770 by task zfs/94732
[ 1243.237007]
[ 1243.237420] CPU: 10 PID: 94732 Comm: zfs Tainted: P OE 5.15.0-2-amd64 #1 Debian 5.15.5-2.1
[ 1243.239855] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1243.241980] Call Trace:
[ 1243.242645] <TASK>
[ 1243.243207] dump_stack_lvl+0x46/0x5a
[ 1243.244220] print_address_description.constprop.0+0x1f/0x140
[ 1243.245709] ? stack_trace_consume_entry+0x58/0x80
[ 1243.246942] kasan_report.cold+0x83/0xdf
[ 1243.247984] ? stack_trace_consume_entry+0x58/0x80
[ 1243.249224] ? kasan_save_stack+0x1b/0x40
[ 1243.250269] stack_trace_consume_entry+0x58/0x80
[ 1243.251482] ? create_prof_cpu_mask+0x20/0x20
[ 1243.252615] arch_stack_walk+0x78/0xf0
[ 1243.253605] ? kfree+0xc5/0x280
[ 1243.254426] ? kasan_save_stack+0x1b/0x40
[ 1243.255493] ? kfree+0xc5/0x280
[ 1243.256315] stack_trace_save+0x91/0xc0
[ 1243.257318] ? stack_trace_consume_entry+0x80/0x80
[ 1243.258564] ? luaD_call+0x11f/0x200 [zlua]
[ 1243.259791] ? resume_cb+0x190/0x190 [zlua]
[ 1243.260931] kasan_save_stack+0x1b/0x40
[ 1243.261924] ? lua_setfield+0xb0/0xb0 [zlua]
[ 1243.263066] ? luaD_rawrunprotected+0x10a/0x160 [zlua]
[ 1243.264457] ? lua_setfield+0xb0/0xb0 [zlua]
[ 1243.265604] ? f_parser+0x190/0x190 [zlua]
[ 1243.266720] ? lua_setfield+0xb0/0xb0 [zlua]
[ 1243.267890] ? lua_setfield+0xb0/0xb0 [zlua]
[ 1243.269038] ? luaD_rawrunprotected+0xd2/0x160 [zlua]
[ 1243.270393] ? lua_setfield+0xb0/0xb0 [zlua]
[ 1243.271648] ? luaD_rawrunprotected+0xd2/0x160 [zlua]
[ 1243.273003] ? luaF_close+0x33/0x1b0 [zlua]
[ 1243.274127] ? luaD_pcall+0xa0/0x130 [zlua]
[ 1243.275255] ? lua_pcallk+0x10a/0x290 [zlua]
[ 1243.276418] kasan_set_track+0x1c/0x30
[ 1243.277396] kasan_set_free_info+0x20/0x30
[ 1243.278462] __kasan_slab_free+0xec/0x120
[ 1243.279533] slab_free_freelist_hook+0x66/0x130
[ 1243.280704] ? zcp_eval+0x4b4/0x9c0 [zfs]
[ 1243.282984] kfree+0xc5/0x280
[ 1243.283781] zcp_eval+0x4b4/0x9c0 [zfs]
[ 1243.285509] ? zcp_dataset_hold+0x150/0x150 [zfs]
[ 1243.287483] ? spl_kmem_alloc_impl+0xf6/0x110 [spl]
[ 1243.288836] ? nv_mem_zalloc.isra.0+0x33/0x60 [znvpair]
[ 1243.290289] ? nvlist_xalloc.part.0+0x86/0x140 [znvpair]
[ 1243.291736] ? zfsdev_ioctl_common+0x635/0xbd0 [zfs]
[ 1243.293750] ? zfsdev_ioctl+0x53/0xe0 [zfs]
[ 1243.295599] ? __x64_sys_ioctl+0xb9/0xf0
[ 1243.296638] ? do_syscall_64+0x3b/0xc0
[ 1243.297623] ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1243.298956] ? nvt_remove_nvpair+0xde/0x1e0 [znvpair]
[ 1243.300334] ? nvpair_type_is_array+0x50/0x50 [znvpair]
[ 1243.301728] ? nvt_remove_nvpair+0x13f/0x1e0 [znvpair]
[ 1243.303145] ? nvt_lookup_name_type.isra.0+0xc8/0x110 [znvpair]
[ 1243.304749] ? fnvlist_lookup_nvpair+0x5f/0xc0 [znvpair]
[ 1243.306174] ? fnvlist_remove_nvpair+0x40/0x40 [znvpair]
[ 1243.307621] zfs_ioc_channel_program+0x169/0x200 [zfs]
[ 1243.309683] ? zfs_ioc_redact+0x180/0x180 [zfs]
[ 1243.311617] ? nvlist_xalloc.part.0+0xde/0x140 [znvpair]
[ 1243.313051] ? nvlist_lookup_nvpair_embedded_index+0x20/0x20 [znvpair]
[ 1243.314783] zfsdev_ioctl_common+0x69a/0xbd0 [zfs]
[ 1243.316777] ? zfsdev_state_destroy+0x70/0x70 [zfs]
[ 1243.318770] ? __kmalloc_node+0x435/0x4e0
[ 1243.319835] ? __virt_addr_valid+0xbe/0x130
[ 1243.320924] ? _copy_from_user+0x3a/0x70
[ 1243.321972] zfsdev_ioctl+0x53/0xe0 [zfs]
[ 1243.323772] __x64_sys_ioctl+0xb9/0xf0
[ 1243.324746] do_syscall_64+0x3b/0xc0
[ 1243.325678] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1243.326986] RIP: 0033:0x7f8180554a97
[ 1243.327947] Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 08
[ 1243.332700] RSP: 002b:00007fff144d26e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1243.334636] RAX: ffffffffffffffda RBX: 00007f817ccf8700 RCX: 00007f8180554a97
[ 1243.336453] RDX: 00007f817ccf5050 RSI: 0000000000005a48 RDI: 0000000000000004
[ 1243.338264] RBP: 00007fff144d27a0 R08: 00007f817cff9000 R09: 0000000000000000
[ 1243.340084] R10: 00007f8182064710 R11: 0000000000000246 R12: 0000000000005a48
[ 1243.341888] R13: 00007f817ccf5050 R14: 00007f817ccf5030 R15: 0000000000000004
[ 1243.343729] </TASK>
[ 1243.344307]
[ 1243.344711] The buggy address belongs to the page:
[ 1243.345937] page:00000000a84dd4da refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11a437
[ 1243.348354] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[ 1243.350028] raw: 0017ffffc0000000 0000000000000000 ffffffff09ee0101 0000000000000000
[ 1243.352007] raw: 0000000000000000 0000000000200000 00000000ffffffff 0000000000000000
[ 1243.353976] page dumped because: kasan: bad access detected
[ 1243.355416] KASAN internal error: frame info validation failed; invalid marker: 18446612695471456264
[ 1243.357698]
[ 1243.358099] Memory state around the buggy address:
[ 1243.359326] ffff88811a437600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1243.361170] ffff88811a437680: 00 00 00 00 00 00 f1 f1 f1 f1 00 f3 f1 f1 f1 f1
[ 1243.363006] >ffff88811a437700: 00 00 00 f3 f3 f3 f3 f3 00 00 00 00 00 00 f1 00
[ 1243.364854] ^
[ 1243.366601] ffff88811a437780: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1243.368456] ffff88811a437800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1243.370288] ==================================================================
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.divide_by_zero (run as root) [00:00] [PASS]
and, indeed:
Test: /usr/local/share/zfs/zfs-tests/tests/functional/channel_program/lua_core/tst.return_recursive_table (run as root) [00:00] [PASS]
[ 1291.472630] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1291.474827] CPU: 11 PID: 94518 Comm: txg_sync Tainted: P B OE 5.15.0-2-amd64 #1 Debian 5.15.5-2.1
[ 1291.477403] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 1291.479562] Call Trace:
[ 1291.480231] <TASK>
[ 1291.480798] dump_stack_lvl+0x46/0x5a
[ 1291.481839] panic+0x18b/0x389
[ 1291.482687] ? __warn_printk+0xf3/0xf3
[ 1291.483682] ? kasan_save_stack+0x32/0x40
[ 1291.484754] ? kasan_save_stack+0x1b/0x40
[ 1291.485819] ? __schedule+0xca/0xf90
[ 1291.486776] ? schedule+0x30/0x120
[ 1291.488448] __schedule+0xf8b/0xf90
[ 1291.489857] ? trace_event_raw_event_hrtimer_start+0x1b0/0x1b0
[ 1291.491444] ? io_schedule_timeout+0xb0/0xb0
[ 1291.493453] ? llist_add_batch+0x33/0x50
[ 1291.494661] schedule+0x6d/0x120
[ 1291.495524] schedule_timeout+0xe4/0x1f0
[ 1291.496660] ? usleep_range+0xe0/0xe0
[ 1291.498114] ? try_to_wake_up+0x392/0x910
[ 1291.499820] ? __bpf_trace_tick_stop+0xe0/0xe0
[ 1291.501502] ? __mutex_unlock_slowpath.constprop.0+0x210/0x210
[ 1291.503940] ? __native_queued_spin_unlock+0x9/0x10
[ 1291.505782] ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
[ 1291.508329] __cv_timedwait_common+0x19e/0x2b0 [spl]
[ 1291.510112] ? __cv_wait_idle+0xd0/0xd0 [spl]
[ 1291.511990] ? recalc_sigpending+0x5a/0x70
[ 1291.513854] ? finish_wait+0x100/0x100
[ 1291.515414] ? mutex_unlock+0x80/0xd0
[ 1291.517352] ? bpobj_space+0x10c/0x120 [zfs]
[ 1291.520380] __cv_timedwait_idle+0x9a/0xe0 [spl]
[ 1291.522484] ? __cv_timedwait_sig+0x70/0x70 [spl]
[ 1291.524696] ? __bitmap_weight+0x71/0x90
[ 1291.526262] txg_sync_thread+0x24f/0x760 [zfs]
[ 1291.529279] ? kasan_set_track+0x1c/0x30
Test: /usr/local/share/zfs/zfs-tes[ts/t es1ts2/f9un1ct.530368] ? txg_fini+0x300/0x300 [zfs]
iona[l/ ch1an2n9e1l_.pr533317] thread_generic_wrapper+0xa8/0xc0 [spl]
ogram[/ l1ua2_9c1or.e/5t35126] ? __thread_exit+0x20/0x20 [spl]
[ 1291.537038] kthread+0x1d2/0x200
st.s[t ac1k_2gs9ub1 (.r537988] ? set_kthread_struct+0x80/0x80
[ 1291.539652] ret_from_fork+0x22/0x30
un[ a1s2 r9o1ot.) 5[4000:947] </TASK>
00] [PASS]
[ 1291.542504] Kernel Offset: 0x20600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1291.546252] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---
I can't necessarily give it any, uh, more RAM? (I mean I could, but I don't love the idea of swapping out my MX.) And the overall times don't seem to breach more than one CPU, anyway, so?
real 22m36.744s
user 24m19.753s
sys 5m46.450s
Here's a send of the image and qemu driver (the boot bundle needs extracted from /boot, or a bootloader installed; this also wants a scratch filesystem at /scratchpsko (i just did zpool create scratchpsko vdb
and chown/chmod)) if you're interested: https://foreign.nabijaczleweli.xyz/pub/kt
Curious. I'm wildly speculating that all the outline calls make it more vulnerable to something smashing it in ways it can't recover from? Or I keep getting lucky with my smashing not blowing up the world...I'll try the VM and see if it blows the same way for me, and if swapping the kernel around changes anything.
Casual 2 cents from papa know-it-all.
I just:
kasan
in .config
find /usr/src/linux-5.15.23-kasan -mindepth 2 -name Makefile | xargs sed -i '$ a\KASAN_SANITIZE := n'
16 vCPU/10 GiB VM used, no memory problems (so far).
<3>[ 506.091401] ==================================================================
<3>[ 506.095311] BUG: KASAN: stack-out-of-bounds in auxgetinfo+0x306/0x600 [zlua]
<3>[ 506.095311] Write of size 4 at addr ffff8881099cf5c8 by task txg_sync/56269
<3>[ 506.095311]
<3>[ 506.095311] CPU: 1 PID: 56269 Comm: txg_sync Tainted: P O 5.15.23-kasan #1
<3>[ 506.095311] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
<3>[ 506.095311] Call Trace:
<3>[ 506.095311] <TASK>
<3>[ 506.095311] dump_stack_lvl+0x46/0x5a
<3>[ 506.095311] print_address_description.constprop.0+0x1f/0x140
<3>[ 506.095311] ? auxgetinfo+0x306/0x600 [zlua]
<3>[ 506.095311] kasan_report.cold+0x83/0xdf
<3>[ 506.095311] ? auxgetinfo+0x306/0x600 [zlua]
<3>[ 506.095311] kasan_check_range+0x142/0x190
<3>[ 506.095311] memcpy+0x39/0x60
<3>[ 506.095311] auxgetinfo+0x306/0x600 [zlua]
<3>[ 506.095311] ? newshrstr+0xe6/0x210 [zlua]
<3>[ 506.095311] lua_getinfo+0xe0/0x310 [zlua]
<3>[ 506.095311] ? zcp_cleanup+0x90/0x90 [zfs]
<3>[ 506.095311] luaL_traceback+0x11d/0x220 [zlua]
<3>[ 506.095311] ? pushfuncname+0x220/0x220 [zlua]
<3>[ 506.095311] ? luaV_tonumber+0x1b0/0x1b0 [zlua]
<3>[ 506.095311] ? luaV_gettable+0xea/0x3c0 [zlua]
<3>[ 506.095311] ? zcp_cleanup+0x90/0x90 [zfs]
<3>[ 506.095311] zcp_error_handler+0x3d/0x70 [zfs]
<3>[ 506.095311] luaD_precall+0x2d3/0xd40 [zlua]
<3>[ 506.095311] luaD_call+0x111/0x280 [zlua]
<3>[ 506.095311] ? luaB_getmetatable+0x50/0x50 [zlua]
<3>[ 506.095311] luaG_errormsg+0x205/0x2b0 [zlua]
<3>[ 506.095311] lua_error+0xa/0x10 [zlua]
<3>[ 506.327275] luaD_precall+0x2d3/0xd40 [zlua]
<3>[ 506.327275] luaV_execute+0x1cc1/0x4800 [zlua]
<3>[ 506.327275] ? luaD_precall+0x86e/0xd40 [zlua]
<3>[ 506.327275] luaD_call+0x201/0x280 [zlua]
<3>[ 506.327275] luaD_rawrunprotected+0x114/0x200 [zlua]
<3>[ 506.327275] ? lua_setmetatable+0x570/0x570 [zlua]
<3>[ 506.327275] ? f_parser+0x340/0x340 [zlua]
<3>[ 506.327275] ? luaD_rawrunprotected+0xfd/0x200 [zlua]
<3>[ 506.327275] ? luaM_realloc_+0x99/0x220 [zlua]
<3>[ 506.327275] luaD_pcall+0xe0/0x300 [zlua]
<3>[ 506.327275] ? luaH_newkey+0x38b/0x520 [zlua]
<3>[ 506.327275] lua_pcallk+0x154/0x6b0 [zlua]
<3>[ 506.327275] ? luaV_settable+0x3ab/0x550 [zlua]
<3>[ 506.419359] ? f_call+0x90/0x90 [zlua]
<3>[ 506.419359] ? dsl_dir_phys+0x60/0x60 [zfs]
<3>[ 506.419359] ? dsl_dir_phys+0x60/0x60 [zfs]
<3>[ 506.419359] zcp_eval_impl+0x158/0x8a0 [zfs]
<3>[ 506.419359] ? zcp_eval_impl+0x8a0/0x8a0 [zfs]
<3>[ 506.419359] dsl_sync_task_sync+0x213/0x3d0 [zfs]
<3>[ 506.419359] dsl_pool_sync+0x969/0xda0 [zfs]
<3>[ 506.419359] ? zap_lookup+0x12/0x20 [zfs]
<3>[ 506.419359] ? dsl_pool_undirty_space+0x1e0/0x1e0 [zfs]
<3>[ 506.419359] ? vdev_obsolete_sm_object+0x190/0x190 [zfs]
<3>[ 506.419359] spa_sync_iterate_to_convergence+0x18a/0x450 [zfs]
<3>[ 506.419359] spa_sync+0x6c9/0x12c0 [zfs]
<3>[ 506.419359] ? __cond_resched+0x16/0x40
<3>[ 506.419359] ? spa_async_dispatch+0x1b0/0x1b0 [zfs]
<3>[ 506.515275] ? spa_txg_history_set+0x14e/0x1e0 [zfs]
<3>[ 506.515275] txg_sync_thread+0x5ae/0x960 [zfs]
<3>[ 506.515275] ? slab_free_freelist_hook+0x66/0x130
<3>[ 506.515275] ? txg_dispatch_callbacks+0x1b0/0x1b0 [zfs]
<3>[ 506.515275] ? kfree+0xc5/0x280
<3>[ 506.515275] ? txg_dispatch_callbacks+0x1b0/0x1b0 [zfs]
<3>[ 506.515275] thread_generic_wrapper+0x171/0x200 [spl]
<3>[ 506.515275] ? _raw_spin_unlock_irqrestore+0xa/0x20
<3>[ 506.515275] ? IS_ERR+0x10/0x10 [spl]
<3>[ 506.515275] kthread+0x127/0x150
<3>[ 506.515275] ? set_kthread_struct+0x40/0x40
<3>[ 506.515275] ret_from_fork+0x22/0x30
<3>[ 506.515275] </TASK>
<3>[ 506.515275]
<3>[ 506.515275] The buggy address belongs to the page:
<4>[ 506.515275] page:00000000e0daaf00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1099cf
<4>[ 506.515275] flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
<4>[ 506.515275] raw: 0017ffffc0000000 0000000000000000 ffffea0004267388 0000000000000000
<4>[ 506.515275] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
<4>[ 506.515275] page dumped because: kasan: bad access detected
<3>[ 506.515275] KASAN internal error: frame info validation failed; invalid marker: 16140896666449346560
<3>[ 506.515275]
<3>[ 506.515275] Memory state around the buggy address:
<3>[ 506.515275] ffff8881099cf480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<3>[ 506.643250] ffff8881099cf500: 00 00 00 00 f3 f3 f3 f3 00 00 00 00 00 00 f1 f1
<3>[ 506.643250] >ffff8881099cf580: f1 f1 00 00 00 00 00 00 00 f1 f1 00 00 00 00 00
<3>[ 506.643250] ^
<3>[ 506.675258] ffff8881099cf600: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
<3>[ 506.675258] ffff8881099cf680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<3>[ 506.675258] ==================================================================
Describe the feature would like to see added to OpenZFS
Can we run the zts/ztest CI with the kernel address sanitization KASAN?
How will this feature improve OpenZFS?
We are more likely to identify kernel memory corruption.
Additional context
12216 does this for userland.
Can I do this myself by just compiling a kernel with KASAN enabled, and building ZFS as usual? Is there any documentation I should look into for this?