Closed jpsollie closed 3 years ago
it's getting weirder every time: when mounted with -o fsck,norecovery,nochanges. dmesg:
[ 639.667129] bcachefs: bch2_fs_open()
[ 639.667133] bcachefs: bch2_read_super()
[ 639.670273] bcachefs: bch2_read_super() ret 0
[ 639.671768] bcachefs: bch2_read_super()
[ 639.675408] bcachefs: bch2_read_super() ret 0
[ 639.675512] bcachefs: bch2_read_super()
[ 639.679179] bcachefs: bch2_read_super() ret 0
[ 639.679283] bcachefs: bch2_read_super()
[ 639.681907] bcachefs: bch2_read_super() ret 0
[ 639.682009] bcachefs: bch2_read_super()
[ 639.682450] bcachefs: bch2_read_super() ret 0
[ 639.682551] bcachefs: bch2_read_super()
[ 639.682879] bcachefs: bch2_read_super() ret 0
[ 639.683001] bcachefs: bch2_read_super()
[ 639.722438] bcachefs: bch2_read_super() ret 0
[ 639.722686] bcachefs: bch2_read_super()
[ 639.723114] bcachefs: bch2_read_super() ret 0
[ 639.723215] bcachefs: bch2_fs_alloc()
[ 639.768369] bcachefs: bch2_fs_journal_init()
[ 639.768379] bcachefs: bch2_fs_journal_init() ret 0
[ 639.768422] bcachefs: bch2_fs_btree_cache_init()
[ 639.768472] bcachefs: bch2_fs_btree_cache_init() ret 0
[ 639.768533] bcachefs: bch2_fs_encryption_init()
[ 639.768548] bcachefs: bch2_fs_encryption_init() ret 0
[ 639.768551] bcachefs: __bch2_fs_compress_init()
[ 639.768553] bcachefs: __bch2_fs_compress_init() ret 0
[ 639.768555] bcachefs: bch2_fs_fsio_init()
[ 639.768562] bcachefs: bch2_fs_fsio_init() ret 0
[ 639.768563] bcachefs: bch2_dev_alloc()
[ 639.795685] bcachefs: bch2_dev_alloc() ret 0
[ 639.795689] bcachefs: bch2_dev_alloc()
[ 639.796157] bcachefs: bch2_dev_alloc() ret 0
[ 639.796159] bcachefs: bch2_dev_alloc()
[ 639.796609] bcachefs: bch2_dev_alloc() ret 0
[ 639.796611] bcachefs: bch2_dev_alloc()
[ 639.797065] bcachefs: bch2_dev_alloc() ret 0
[ 639.797069] bcachefs: bch2_dev_alloc()
[ 639.797531] bcachefs: bch2_dev_alloc() ret 0 [ 639.797533] bcachefs: bch2_dev_alloc()
[ 639.797976] bcachefs: bch2_dev_alloc() ret 0
[ 639.797977] bcachefs: bch2_dev_alloc()
[ 639.798875] bcachefs: bch2_dev_alloc() ret 0 [ 639.798877] bcachefs: bch2_dev_alloc()
[ 639.799322] bcachefs: bch2_dev_alloc() ret 0
[ 639.799547] bcachefs: bch2_fs_alloc() ret 0
[ 639.799671] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): recovering from clean shutdown, journal seq 69371
[ 670.601621] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): journal read done, 0 keys in 1 entries, seq 69372
[ 670.678134] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): starting alloc read
[ 670.843638] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): alloc read done
[ 670.843642] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): starting stripes_read
[ 670.843644] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): stripes_read done
[ 670.843645] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): starting mark and sweep
[ 671.054528] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): mark and sweep done [ 671.054535] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): ret 0
[ 671.054583] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): mounted with opts: metadata_replicas=2,metadata_target=SSDM,foreground_target=SSDC,background_target=HDD,promote_target=SSDC,noinodes_32bit,noinodes_use_key_cache,verbose,fsck,fix_errors,nochanges,norecovery
the original file where the script crashed is ok, but no idea where it starts failing:
linuxserver ~ # cat /var/log/secstorbackup.log
2021/08/04 08:39:54 [3711] building file list
2021/08/04 08:39:54 [3711] .d..t...... ellen/
2021/08/04 08:39:54 [3711] >f+++++++++ ellen/20201101_150257.jpg
linuxserver ~ # cp /data/securestorage/ellen/20201101_150257.jpg /mnt/satabackup/ellen/
linuxserver ~ # cp --preserve=all /data/securestorage/ellen/20201101_150257.jpg /mnt/satabackup/ellen/
linuxserver ~ # rsync -I -a /data/securestorage/ellen/20201101_150257.jpg /mnt/satabackup/ellen/
linuxserver ~ # sh /etc/cron.monthly/backups.sh
/etc/cron.monthly/backups.sh: line 36: 3829 Segmentation fault nice rsync -a /data/securestorage/ /mnt/satabackup/ --log-file=/var/log/secstorbackup.log > /dev/null 2>&1
linuxserver ~ # cat /var/log/secstorbackup.log
2021/08/04 09:01:51 [3829] building file list 2021/08/04 09:01:52 [3829] .d..t...... ellen/
linuxserver ~ #
Any clue here?
new story: when the bug occurs, and the filesystem is mounted nochanges,norecovery, umount -l -f also segfaults at the same function, but the stack trace is different, so it may provide more info:
[17265.410467] ------------[ cut here ]------------
[17265.410475] Kernel BUG at journal_entry_open+0x22f/0x260 [bcachefs] [verbose debug info unavailable]
[17265.410506] invalid opcode: 0000 [#2] SMP
[17265.410511] CPU: 16 PID: 6818 Comm: umount Tainted: G D W 5.13.7release+ #8
[17265.410516] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[17265.410519] RIP: 0010:journal_entry_open+0x22f/0x260 [bcachefs]
[17265.410543] Code: 00 48 8d bb 90 12 00 00 e8 2e a7 8f e0 f0 83 44 24 fc 00 48 8d bb 98 12 00 00 e8 1c a7 8f e0 5b 45 31 c0 44 89 c0 5d 41 5c c3 <0f> 0b 0f 0b 4c 8b a3 20 17 00 00 65 8b 3d 0f c1 4a 5f e8 ba b5 57 [17265.410550] RSP: 0018:ffff88847f9a3d50 EFLAGS: 00010202
[17265.410554] RAX: ffff888100904000 RBX: ffff88847f9a3e18 RCX: 0000000000010efc
[17265.410558] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff888403a137b0
[17265.410561] RBP: ffff888403a137b0 R08: ffff88811947aa70 R09: 00000000000ffffe [17265.410564] R10: 003fffffffffffff R11: 0000000000000000 R12: ffc00fffffffffff
[17265.410566] R13: fffffffc00ffffff R14: 0000000000000000 R15: 0000000000000000
[17265.410569] FS: 00007f6ff81df740(0000) GS:ffff88885ec00000(0000) knlGS:0000000000000000
[17265.410573] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17265.410576] CR2: 00007f9fc95990b0 CR3: 0000000409d0d000 CR4: 00000000001506e0
[17265.410579] Call Trace:
[17265.410582] __journal_res_get+0x38d/0x5f0 [bcachefs]
[17265.410606] ? list_lru_del+0x54/0xa0
[17265.410611] ? path_check_mount+0x40/0x40
[17265.410616] bch2_journal_res_get_slowpath+0x17/0xf0 [bcachefs]
[17265.410638] bch2_journal_flush_seq_async.part.0+0x2cd/0x470 [bcachefs]
[17265.410660] bch2_sync_fs+0x27/0x30 [bcachefs] [17265.410682] sync_filesystem+0x38/0x80
[17265.410687] generic_shutdown_super+0x1d/0xf0
[17265.410692] bch2_kill_sb+0xd/0x20 [bcachefs]
[17265.410714] deactivate_locked_super+0x2c/0x80
[17265.410718] cleanup_mnt+0x12c/0x180
[17265.410721] task_work_run+0x54/0x80
[17265.410727] exit_to_user_mode_loop+0xc5/0xd0
[17265.410733] exit_to_user_mode_prepare+0x85/0x90
[17265.410738] syscall_exit_to_user_mode+0x1d/0x40
[17265.410743] do_syscall_64+0x47/0x70
[17265.410747] entry_SYSCALL_64_after_hwframe+0x44/0xae
[17265.410751] RIP: 0033:0x7f6ff8310277 [17265.410754] Code: ac 08 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 c9 ab 08 00 f7 d8 64 89 02 b8
[17265.410760] RSP: 002b:00007ffd99d1f938 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
[17265.410764] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 00007f6ff8310277
[17265.410767] RDX: 0000000000000014 RSI: 0000000000000003 RDI: 00005578356d9200
[17265.410770] RBP: 00005578356d2620 R08: 00005578356d8fa0 R09: 0000000000000000
[17265.410772] R10: 0000000000000000 R11: 0000000000000202 R12: 00005578356d9200
[17265.410774] R13: 0000000000000003 R14: 00005578356d2510 R15: 0000000000000003
[17265.410778] Modules linked in: btrfs blake2b_generic xfs bcachefs xor raid6_pq i2c_dev f2fs uas input_leds intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd kvm irqbypass ghash_clmulni_intel amdgpu iwlmvm rapl gigabyte_wmi wmi_bmof usb_storage drm_ttm_helper igb aacraid btusb ttm iwlwifi sp5100_tco btintel backlight ptp gpu_sched pps_core i2c_piix4 bluetooth hid_generic tpm_crb tpm_tis tpm_tis_core pinctrl_amd efivarfs
[17265.410818] ---[ end trace b215b693d50bb831 ]---
[17266.151650] RIP: 0010:btree_iter_traverse_one+0x8b3/0x8f0 [bcachefs]
[17266.151677] Code: d3 49 5d e0 e9 4e ff ff ff 41 0f b6 55 5a e9 58 f8 ff ff 0f 0b 66 90 e8 ab 52 00 00 83 f8 fc 41 89 c6 0f 94 c2 e9 24 fb ff ff <0f> 0b 89 c8 41 0f b6 55 5a 83 e0 0f e9 db f7 ff ff 48 63 44 24 24
[17266.151685] RSP: 0018:ffff88840a0837c8 EFLAGS: 00010202
[17266.151688] RAX: 0000000000000001 RBX: ffff88840a0d8000 RCX: 0000000000000000
[17266.151691] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88840a0d8000
[17266.151693] RBP: 0000000000000001 R08: 00000000ffffffff R09: ffff888474026c00
[17266.151695] R10: 0000000000010c00 R11: ffff88810b2e0000 R12: 0000000000000000
[17266.151698] R13: ffff88840a0d8000 R14: 0000000000000000 R15: 0000000000000103
[17266.151700] FS: 00007f6ff81df740(0000) GS:ffff88885ec00000(0000) knlGS:0000000000000000
[17266.151702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17266.151705] CR2: 00007f9fc95990b0 CR3: 0000000409d0d000 CR4: 00000000001506e0
In order to be as complete as possible, I collected some debug info (metadata, fsck output, journal), if anybody would like to take a look at it, be my guest https://we.tl/t-kuXwG8s7Gq
@koverstreet requested a faddr2line, but due to reduced debug info, this was unavailable. So I compiled a debug kernel (bcachefs_debug=y, loadallksyms, verbose bug reporting) and regenerated the info, but the stack was omewhat different. So instead of faddr2line, I piped dmesg through decode_stacktrace.sh
2470.526556] ------------[ cut here ]------------
[ 2470.526560] kernel BUG at fs/bcachefs/btree_iter.c:1608!
[ 2470.526565] invalid opcode: 0000 [#1] SMP
[ 2470.526572] CPU: 18 PID: 3852 Comm: rsync Not tainted 5.13.7debug+ #11
[ 2470.526577] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[ 2470.526581] RIP: 0010:btree_iter_set_search_pos (fs/bcachefs/btree_iter.c:1608 (discriminator 1)) bcachefs
[ 2470.526608] Code: ff 89 d9 d3 c7 21 fa 21 f8 66 0f be d2 66 98 c1 e2 08 c1 e0 04 09 d0 41 0f b7 56 5a 66 81 e2 0f f0 09 d0 66 41 89 46 5a eb 0a <0f> 0b 41 0f b6 76 59 83 e6 0f 41 0f b6 46 58 89 c2 c0 ea 04 83 e2
All code
========
0: ff 89 d9 d3 c7 21 decl 0x21c7d3d9(%rcx)
6: fa cli
7: 21 f8 and %edi,%eax
9: 66 0f be d2 movsbw %dl,%dx
d: 66 98 cbtw
f: c1 e2 08 shl $0x8,%edx
12: c1 e0 04 shl $0x4,%eax 15: 09 d0 or %edx,%eax
17: 41 0f b7 56 5a movzwl 0x5a(%r14),%edx
1c: 66 81 e2 0f f0 and $0xf00f,%dx
21: 09 d0 or %edx,%eax
23: 66 41 89 46 5a mov %ax,0x5a(%r14)
28: eb 0a jmp 0x34
2a:* 0f 0b ud2 <-- trapping instruction
2c: 41 0f b6 76 59 movzbl 0x59(%r14),%esi
31: 83 e6 0f and $0xf,%esi
34: 41 0f b6 46 58 movzbl 0x58(%r14),%eax
39: 89 c2 mov %eax,%edx
3b: c0 ea 04 shr $0x4,%dl
3e: 83 .byte 0x83
3f: e2 .byte 0xe2
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 41 0f b6 76 59 movzbl 0x59(%r14),%esi
7: 83 e6 0f and $0xf,%esi
a: 41 0f b6 46 58 movzbl 0x58(%r14),%eax
f: 89 c2 mov %eax,%edx
11: c0 ea 04 shr $0x4,%dl
14: 83 .byte 0x83
15: e2 .byte 0xe2
[ 2470.526614] RSP: 0018:ffff888462f7f830 EFLAGS: 00010202
[ 2470.526618] RAX: 0000000000000008 RBX: ffff888482188000 RCX: 0000000000000184
[ 2470.526621] RDX: 0000000000000000 RSI: 0000000000000183 RDI: ffff888462f7fa90
[ 2470.526624] RBP: 0000000000000001 R08: 0000000000000183 R09: 0000000000000001
[ 2470.526626] R10: 0000000000000c00 R11: 0000000000000001 R12: ffff888136949880
[ 2470.526628] R13: 0000000000000000 R14: ffff888482188000 R15: 0000000000000184
[ 2470.526631] FS: 00007f5cda0a4740(0000) GS:ffff88885ec80000(0000) knlGS:0000000000000000
[ 2470.526634] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2470.526637] CR2: 0000559217a10da8 CR3: 000000048a087000 CR4: 00000000001506e0
[ 2470.526639] Call Trace:
[ 2470.526643] bch2_btree_iter_peek_slot (fs/bcachefs/btree_iter.c:1493 fs/bcachefs/btree_iter.c:1897) bcachefs
[ 2470.526667] bchfs_read (./include/linux/err.h:59 fs/bcachefs/btree_iter.h:274 fs/bcachefs/fs-io.c:815) bcachefs
[ 2470.526694] ? bio_alloc_bioset (block/bio.c:447)
[ 2470.526700] bch2_readahead (fs/bcachefs/fs-io.c:698 (discriminator 2) fs/bcachefs/fs-io.c:892 (discriminator 2)) bcachefs
[ 2470.526726] ? bch2_readahead (fs/bcachefs/bcachefs_format.h:143 (discriminator 2)) bcachefs
[ 2470.526752] read_pages (./include/linux/pagemap.h:919 mm/readahead.c:132)
[ 2470.526757] page_cache_ra_unbounded (./arch/x86/include/asm/current.h:15 ./include/linux/sched/mm.h:263 mm/readahead.c:239)
[ 2470.526761] filemap_get_pages (mm/filemap.c:2470)
[ 2470.526765] filemap_read (mm/filemap.c:2552)
[ 2470.526768] ? __pagevec_lru_add_fn (./include/linux/mm_inline.h:88 mm/swap.c:1043)
[ 2470.526772] ? __pagevec_lru_add (./include/linux/pagevec.h:56 mm/swap.c:1066)
[ 2470.526774] ? lru_cache_add (mm/swap.c:476)
[ 2470.526777] ? do_anonymous_page (mm/memory.c:3674)
[ 2470.526780] bch2_read_iter (fs/bcachefs/fs-io.c:1799) bcachefs
[ 2470.526805] new_sync_read (fs/read_write.c:416 (discriminator 1))
[ 2470.526810] vfs_read (fs/read_write.c:496)
[ 2470.526813] ksys_read (fs/read_write.c:635)
[ 2470.526816] do_syscall_64 (arch/x86/entry/common.c:47)
[ 2470.526821] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:112)
[ 2470.526825] RIP: 0033:0x7f5cda1b0dae
[ 2470.526827] Code: c0 e9 b6 fe ff ff 48 8d 3d 3f be 06 00 50 e8 49 ce 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
All code
========
0: c0 e9 b6 shr $0xb6,%cl
3: fe (bad)
4: ff (bad)
5: ff 48 8d decl -0x73(%rax)
8: 3d 3f be 06 00 cmp $0x6be3f,%eax
d: 50 push %rax
e: e8 49 ce 01 00 callq 0x1ce5c
13: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
1a: 00 00
1c: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
23: 00
24: 85 c0 test %eax,%eax
26: 75 14 jne 0x3c
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 5a ja 0x8c
32: c3 retq
33: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
3a: 00 00
3c: 48 83 ec 28 sub $0x28,%rsp
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 5a ja 0x62
8: c3 retq
9: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
10: 00 00
12: 48 83 ec 28 sub $0x28,%rsp
[ 2470.526833] RSP: 002b:00007ffc0b5c5558 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2470.526836] RAX: ffffffffffffffda RBX: 0000559216684280 RCX: 00007f5cda1b0dae
[ 2470.526838] RDX: 0000000000040000 RSI: 00005592179d0da0 RDI: 0000000000000003
[ 2470.526841] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
[ 2470.526843] R10: 0000000000040030 R11: 0000000000000246 R12: 0000000000000000
[ 2470.526845] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000040000
[ 2470.526848] Modules linked in: btrfs blake2b_generic bcachefs xor raid6_pq f2fs i2c_dev uas amdgpu input_leds intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd iwlmvm kvm btusb igb btintel irqbypass iwlwifi ptp ghash_clmulni_intel bluetooth sp5100_tco gigabyte_wmi usb_storage hid_generic rapl wmi_bmof pps_core aacraid i2c_piix4 drm_ttm_helper ttm backlight gpu_sched tpm_crb tpm_tis pinctrl_amd tpm_tis_core efivarfs
[ 2470.526880] ---[ end trace 7351a3ec7e320e50 ]---
[ 2470.636814] RIP: 0010:btree_iter_set_search_pos (fs/bcachefs/btree_iter.c:1608 (discriminator 1)) bcachefs [ 2470.636840] Code: ff 89 d9 d3 c7 21 fa 21 f8 66 0f be d2 66 98 c1 e2 08 c1 e0 04 09 d0 41 0f b7 56 5a 66 81 e2 0f f0 09 d0 66 41 89 46 5a eb 0a <0f> 0b 41 0f b6 76 59 83 e6 0f 41 0f b6 46 58 89 c2 c0 ea 04 83 e2
All code
========
0: ff 89 d9 d3 c7 21 decl 0x21c7d3d9(%rcx)
6: fa cli
7: 21 f8 and %edi,%eax
9: 66 0f be d2 movsbw %dl,%dx
d: 66 98 cbtw
f: c1 e2 08 shl $0x8,%edx
12: c1 e0 04 shl $0x4,%eax
15: 09 d0 or %edx,%eax
17: 41 0f b7 56 5a movzwl 0x5a(%r14),%edx
1c: 66 81 e2 0f f0 and $0xf00f,%dx
21: 09 d0 or %edx,%eax
23: 66 41 89 46 5a mov %ax,0x5a(%r14)
28: eb 0a jmp 0x34
2a:* 0f 0b ud2 <-- trapping instruction
2c: 41 0f b6 76 59 movzbl 0x59(%r14),%esi
31: 83 e6 0f and $0xf,%esi
34: 41 0f b6 46 58 movzbl 0x58(%r14),%eax
39: 89 c2 mov %eax,%edx
3b: c0 ea 04 shr $0x4,%dl
3e: 83 .byte 0x83
3f: e2 .byte 0xe2
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 41 0f b6 76 59 movzbl 0x59(%r14),%esi
7: 83 e6 0f and $0xf,%esi
a: 41 0f b6 46 58 movzbl 0x58(%r14),%eax
f: 89 c2 mov %eax,%edx
11: c0 ea 04 shr $0x4,%dl
14: 83 .byte 0x83
15: e2 .byte 0xe2
[ 2470.636846] RSP: 0018:ffff888462f7f830 EFLAGS: 00010202
[ 2470.636849] RAX: 0000000000000008 RBX: ffff888482188000 RCX: 0000000000000184
[ 2470.636852] RDX: 0000000000000000 RSI: 0000000000000183 RDI: ffff888462f7fa90
[ 2470.636855] RBP: 0000000000000001 R08: 0000000000000183 R09: 0000000000000001
[ 2470.636857] R10: 0000000000000c00 R11: 0000000000000001 R12: ffff888136949880
[ 2470.636860] R13: 0000000000000000 R14: ffff888482188000 R15: 0000000000000184
[ 2470.636862] FS: 00007f5cda0a4740(0000) GS:ffff88885ec80000(0000) knlGS:0000000000000000
[ 2470.636866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2470.636868] CR2: 0000559217a10da8 CR3: 000000048a087000 CR4: 00000000001506e0
[ 2470.636872] ------------[ cut here ]------------
It was an unhandled transaction restart - we recently (in 32638053e1 bcachefs: trans->restarted) started aggressively checking whether transaction restarts were being handled correctly.
Fixed by 60fbf06f49
removing corrupted files doesn't work yet. Quite possible related to same issue:
[49762.145470] CR2: 000055a779f36088 CR3: 000000010bf4f000 CR4: 00000000001506e0
[49762.145472] Call Trace:
[49762.145474] ? bch2_trans_unlink_iters+0xc5/0x200 [bcachefs]
[49762.145487] bch2_trans_begin+0xf4/0x120 [bcachefs]
[49762.145500] bch2_btree_delete_range_trans+0x7a/0x260 [bcachefs]
[49762.145513] bch2_inode_rm+0xc2/0x330 [bcachefs]
[49762.145527] ? truncate_inode_pages_range+0x19a/0x690
[49762.145531] ? bch2_inode_rm+0x7a/0x330 [bcachefs]
[49762.145544] evict+0xb7/0x160
[49762.145547] do_unlinkat+0x1a8/0x2b0
[49762.145551] do_syscall_64+0x3a/0x70
[49762.145555] entry_SYSCALL_64_after_hwframe+0x44/0xae
[49762.145406] ------------[ cut here ]------------
[49762.145409] Kernel BUG at __btree_iter_traverse_all.constprop.0+0x22f/0x2b0 [bcachefs] [verbose debug info unavailable]
[49762.145426] invalid opcode: 0000 [#49 ]
[49762.145431] CPU: 27 PID: 5269 Comm: rm Not tainted 5.13.7release+ #13
[49762.145435] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[49762.145437] RIP: 0010:__btree_iter_traverse_all.constprop.0+0x22f/0x2b0 [bcachefs]
[49762.145450] Code: e8 16 df fe ff 80 63 16 fb 44 89 e0 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 be 01 00
00 00 e8 c6 fc ff ff e9 8d fe ff ff <0f> 0b 80 4b 16 02 eb cb 31 c9 31 c0 89 c2 83 c0 20 48 89 0c 14 48
[49762.145454] RSP: 0018:ffff8881339e7b80 EFLAGS: 00010202
[49762.145457] RAX: ffff888785440000 RBX: ffff8881339e7dc0 RCX: 0000000000000000
[49762.145459] RDX: 0000000000000002 RSI: 000000000000b803 RDI: ffff888785440000
[49762.145461] RBP: ffff888408380000 R08: ffff88845f12f200 R09: 0000129500000000
[49762.145463] R10: ffff8881339e7a40 R11: 0000000000000000 R12: 0000000000000000
[49762.145464] R13: ffff888785440d98 R14: 0000000000000001 R15: ffffffffffffffff
[49762.145466] FS: 00007f6b06128600(0000) GS:ffff88885eec0000(0000)
[49762.145468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[49762.145470] CR2: 000055a779f36088 CR3: 000000010bf4f000 CR4: 00000000001506e0
[49762.145472] Call Trace:
[49762.145474] ? bch2_trans_unlink_iters+0xc5/0x200 [bcachefs]
[49762.145487] bch2_trans_begin+0xf4/0x120 [bcachefs]
[49762.145500] bch2_btree_delete_range_trans+0x7a/0x260 [bcachefs]
[49762.145513] bch2_inode_rm+0xc2/0x330 [bcachefs]
[49762.145527] ? truncate_inode_pages_range+0x19a/0x690
[49762.145531] ? bch2_inode_rm+0x7a/0x330 [bcachefs]
[49762.145544] evict+0xb7/0x160
[49762.145547] do_unlinkat+0x1a8/0x2b0
[49762.145551] do_syscall_64+0x3a/0x70
[49762.145555] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2220.747197] bcachefs (7535597e-5660-4926-a774-87a7725d4de5 inum 39250): no device to read from
[ 2220.747200] bcachefs (7535597e-5660-4926-a774-87a7725d4de5 inum 39250): read error 3 from btree lookup
[ 2220.747949] bcachefs (sdg inum 39250 offset 28160): data checksum error: expected 0:ace7597a got 0:0 (type 5)
[ 2220.747964] bcachefs (7535597e-5660-4926-a774-87a7725d4de5 inum 39250): no device to read from
[ 2220.747966] bcachefs (7535597e-5660-4926-a774-87a7725d4de5 inum 39250): read error 3 from btree lookup
[ 2220.771907] bcachefs (sdg inum 39250 offset 28733): data checksum error: expected 0:b2a6ad98 got 0:c5a5b6b5 (type 5)
[ 2220.771924] bcachefs (7535597e-5660-4926-a774-87a7725d4de5 inum 39250): no device to read from
...
read errors mapping "/data/securestorage/ellen/20210730_210543.jpg": Input/output error (5)
this is no issue to me, except that removing the corrupted file segfaults ...
You still didn't compile your kernel with full debug info
I have 2 kernels: a "release" 5.13.7release+ and "debug" kernel 5.13.7debug+. The 2nd one is booted in case it's needed. The faddr2line never worked. Not for release and not for debug
regenerated the error: to verify file integrity of every file, I executed "find -type f | while read r; do cp "$r" /dev/null. Unfortunately, the error reappears, but without segfault this time:
[7055.750076] ------------[ cut here ]------------
[ 7055.750080] kernel BUG at fs/bcachefs/btree_iter.c:1332!
[ 7055.750086] invalid opcode: 0000 [#1] SMP
[ 7055.750095] CPU: 23 PID: 24846 Comm: kworker/u64:2 Not tainted 5.13.7debug+ #12
[ 7055.750101] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[ 7055.750106] Workqueue: events_unbound __bch2_read_endio [bcachefs]
[ 7055.750145] RIP: 0010:__btree_iter_traverse_all.constprop.0 (/usr/src/bcachefs/fs/bcachefs/btree_iter.c:1332) bcachefs
[ 7055.750176] Code: 89 c3 8b 44 24 28 25 ff ff ff 03 ff c8 74 08 48 89 e7 e8 05 12 d4 e0 85 db 75 da 90 e9 53 fe ff ff 0f 0b b8 fc ff ff ff eb 84 <0f> 0b 48 83 ec 10 48 89 2c 24 4c 89 64 24 08 48 89 fd 65 48 8b 04 All code ======== 0: 89 c3 mov %eax,%ebx 2: 8b 44 24 28 mov 0x28(%rsp),%eax 6: 25 ff ff ff 03 and $0x3ffffff,%eax b: ff c8 dec %eax d: 74 08 je 0x17 f: 48 89 e7 mov %rsp,%rdi 12: e8 05 12 d4 e0 call 0xffffffffe0d4121c 17: 85 db test %ebx,%ebx 19: 75 da jne 0xfffffffffffffff5 1b: 90 nop 1c: e9 53 fe ff ff jmp 0xfffffffffffffe74 21: 0f 0b ud2 23: b8 fc ff ff ff mov $0xfffffffc,%eax 28: eb 84 jmp 0xffffffffffffffae 2a:* 0f 0b ud2 <-- trapping instruction 2c: 48 83 ec 10 sub $0x10,%rsp 30: 48 89 2c 24 mov %rbp,(%rsp) 34: 4c 89 64 24 08 mov %r12,0x8(%rsp) 39: 48 89 fd mov %rdi,%rbp 3c: 65 gs 3d: 48 rex.W 3e: 8b .byte 0x8b 3f: 04 .byte 0x4 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 48 83 ec 10 sub $0x10,%rsp 6: 48 89 2c 24 mov %rbp,(%rsp) a: 4c 89 64 24 08 mov %r12,0x8(%rsp) f: 48 89 fd mov %rdi,%rbp 12: 65 gs 13: 48 rex.W 14: 8b .byte 0x8b 15: 04 .byte 0x4
[ 7055.750183] RSP: 0018:ffff8884c5a6fc10 EFLAGS: 00010202
[ 7055.750187] RAX: ffff888498f601d0 RBX: 0000000000000000 RCX: 0000000000000002
[ 7055.750191] RDX: 00000000000001bc RSI: ffff888498f60000 RDI: 0000000000000002
[ 7055.750194] RBP: ffff88841fe00000 R08: ffff888498f606d0 R09: 0000000000000001
[ 7055.750197] R10: 0000000000000001 R11: ffff888477c218ca R12: ffff8884c5a6fc88
[ 7055.750200] R13: ffff888498f60658 R14: 0000000000000001 R15: ffffffffffffffff
[ 7055.750203] FS: 0000000000000000(0000) GS:ffff88885edc0000(0000) knlGS:0000000000000000
[ 7055.750208] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7055.750211] CR2: 00007f192322c850 CR3: 00000004c6370000 CR4: 00000000001506e0
[ 7055.750215] Call Trace:
[ 7055.750219] ? bch2_trans_iter_put (/usr/src/bcachefs/fs/bcachefs/btree_iter.c:2226 /usr/src/bcachefs/fs/bcachefs/btree_iter.c:2209) bcachefs
[ 7055.750248] ? bch2_trans_unlink_iters (/usr/src/bcachefs/fs/bcachefs/btree_iter.c:2142 /usr/src/bcachefs/fs/bcachefs/btree_iter.c:2201 /usr/src/bcachefs/fs/bcachefs/btree_iter.c:2513) bcachefs
[ 7055.750276] ? __bch2_trans_commit (/usr/src/bcachefs/./include/linux/percpu-refcount.h:325 /usr/src/bcachefs/./include/linux/percpu-refcount.h:338 /usr/src/bcachefs/fs/bcachefs/btree_update_leaf.c:961) bcachefs
[ 7055.750306] bch2_trans_begin (/usr/src/bcachefs/fs/bcachefs/btree_iter.c:2560) bcachefs
[ 7055.750334] bch2_rbio_narrow_crcs (/usr/src/bcachefs/fs/bcachefs/io.c:1801 (discriminator 6)) bcachefs
[ 7055.750367] ? bch2_rbio_narrow_crcs (/usr/src/bcachefs/fs/bcachefs/io.c:1801) bcachefs
[ 7055.750398] __bch2_read_endio (/usr/src/bcachefs/fs/bcachefs/io.c:1843) bcachefs
[ 7055.750430] process_one_work (workqueue.c:?)
[ 7055.750437] worker_thread (workqueue.c:?)
[ 7055.750441] ? process_one_work (workqueue.c:?)
[ 7055.750445] kthread (kthread.c:?)
[ 7055.750449] ? set_kthread_struct (kthread.c:?)
[ 7055.750453] ret_from_fork (??:?)
[ 7055.750458] Modules linked in: btrfs blake2b_generic bcachefs xor raid6_pq i2c_dev f2fs amdgpu uas input_leds intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd kvm iwlmvm irqbypass igb btusb btintel ghash_clmulni_intel bluetooth iwlwifi usb_storage rapl ptp sp5100_tco gigabyte_wmi wmi_bmof hid_generic aacraid drm_ttm_helper pps_core i2c_piix4 ttm backlight gpu_sched tpm_crb tpm_tis pinctrl_amd tpm_tis_core efivarfs
[ 7055.750496] ---[ end trace ee44cc42b3f20b76 ]---
[ 7055.832403] RIP: 0010:__btree_iter_traverse_all.constprop.0 (/usr/src/bcachefs/fs/bcachefs/btree_iter.c:1332) bcachefs
[ 7055.832440] Code: 89 c3 8b 44 24 28 25 ff ff ff 03 ff c8 74 08 48 89 e7 e8 05 12 d4 e0 85 db 75 da 90 e9 53 fe ff ff 0f 0b b8 fc ff ff ff eb 84 <0f> 0b 48 83 ec 10 48 89 2c 24 4c 89 64 24 08 48 89 fd 65 48 8b 04 All code ======== 0: 89 c3 mov %eax,%ebx 2: 8b 44 24 28 mov 0x28(%rsp),%eax 6: 25 ff ff ff 03 and $0x3ffffff,%eax b: ff c8 dec %eax d: 74 08 je 0x17 f: 48 89 e7 mov %rsp,%rdi 12: e8 05 12 d4 e0 call 0xffffffffe0d4121c 17: 85 db test %ebx,%ebx 19: 75 da jne 0xfffffffffffffff5 1b: 90 nop 1c: e9 53 fe ff ff jmp 0xfffffffffffffe74 21: 0f 0b ud2 23: b8 fc ff ff ff mov $0xfffffffc,%eax 28: eb 84 jmp 0xffffffffffffffae 2a:* 0f 0b ud2 <-- trapping instruction 2c: 48 83 ec 10 sub $0x10,%rsp 30: 48 89 2c 24 mov %rbp,(%rsp) 34: 4c 89 64 24 08 mov %r12,0x8(%rsp) 39: 48 89 fd mov %rdi,%rbp 3c: 65 gs 3d: 48 rex.W 3e: 8b .byte 0x8b 3f: 04 .byte 0x4 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 48 83 ec 10 sub $0x10,%rsp 6: 48 89 2c 24 mov %rbp,(%rsp) a: 4c 89 64 24 08 mov %r12,0x8(%rsp) f: 48 89 fd mov %rdi,%rbp 12: 65 gs 13: 48 rex.W 14: 8b .byte 0x8b 15: 04 .byte 0x4
[ 7055.832447] RSP: 0018:ffff8884c5a6fc10 EFLAGS: 00010202
[ 7055.832451] RAX: ffff888498f601d0 RBX: 0000000000000000 RCX: 0000000000000002
[ 7055.832455] RDX: 00000000000001bc RSI: ffff888498f60000 RDI: 0000000000000002
[ 7055.832458] RBP: ffff88841fe00000 R08: ffff888498f606d0 R09: 0000000000000001
[ 7055.832461] R10: 0000000000000001 R11: ffff888477c218ca R12: ffff8884c5a6fc88
[ 7055.832464] R13: ffff888498f60658 R14: 0000000000000001 R15: ffffffffffffffff
[ 7055.832467] FS: 0000000000000000(0000) GS:ffff88885edc0000(0000) knlGS:0000000000000000
[ 7055.832472] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7055.832475] CR2: 00007f192322c850 CR3: 00000004c6370000 CR4: 00000000001506e0
The faddr2line never worked. Not for release and not for debug
Did you compile bcachefs as module?
The faddr2line never worked. Not for release and not for debug
Did you compile bcachefs as module?
Yes. I consider it an advantage to be able to remove / force-remove the code if not functioning for whatever reason
Did you compile bcachefs as module?
Yes. I consider it an advantage to be able to remove / force-remove the code if not functioning for whatever reason
That's why faddr2line wasn't working for you. I don't see how removing the module if something isn't working would ever be useful.
But you finally got file and line number, which is what I needed.
Were there any other errors from bcachefs in your dmesg log?
The device also hosts dhcp / dns for my home network. I thought "if bcachefs module would somewhere cause a lockup / infinite loop, I could still unload it and schedule the restart" cp operation did not say anything. it simply hangs, in 's' state, but nothing is printed to dmesg. EDIT: dmesg previously made some annoying checksum errors on reading corrupted files with rsync, but I removed those, so not related: https://pastebin.com/FW6eaJyt
Module unloading doesn't work that way.
Post your full dmesg log
On Fri, Aug 6, 2021, 22:05 jpsollie @.***> wrote:
The device also hosts dhcp / dns for my home network. I thought "if bcachefs module would somewhere cause a lockup / infinite loop, I could still unload it and schedule the restart" cp operation did not say anything. it simply hangs, in 's' state, but nothing is printed to dmesg.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/koverstreet/bcachefs/issues/294#issuecomment-894590051, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPGX3UE3H4LQNQJNPBXTGTT3SIFBANCNFSM5BQKLAJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .
based on your suggestions, I recompiled the kernel with bcachefs built in, and recreated the error condition (new dmesg in attachment, I believe it looks much cleaner now). I also have the faulting filename now, doing a onetime cp after a fresh reboot fixes the issue, so it may be an issue with scheduling? dmesg_full_stacktrace_decoded.txt
The faddr2line never worked. Not for release and not for debug
Did you compile bcachefs as module?
Yes. I consider it an advantage to be able to remove / force-remove the code if not functioning for whatever reason
If you have bcachefs as module then invocation of fadd2line should be as below:
scripts/faddr2line fs/bcachefs/bcachefs.o ....
The faddr2line never worked. Not for release and not for debug
Did you compile bcachefs as module?
Yes. I consider it an advantage to be able to remove / force-remove the code if not functioning for whatever reason
If you have bcachefs as module then invocation of fadd2line should be as below:
scripts/faddr2line fs/bcachefs/bcachefs.o ....
yes, koverstreet also mentioned that, but unfortunately, that didn't work out either ^^ Anyway, I'll keep the code in-kernel from now on, no problem.
Trying with a preemptive kernel (to avoid some scheduling issues) did not work out either:
[ 3040.584815] ------------[ cut here ]------------
[ 3040.584820] kernel BUG at fs/bcachefs/btree_iter.c:1332!
[ 3040.584826] invalid opcode: 0000 [#1] PREEMPT SMP
[ 3040.584835] CPU: 29 PID: 11825 Comm: kworker/29:2 Not tainted 5.13.7debug+ #17
[ 3040.584841] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[ 3040.584846] Workqueue: bcachefs bch2_write_index
[ 3040.584855] RIP: 0010:__btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1))
[ 3040.584862] Code: 65 ff 0d ec 71 ba 7e 0f 85 37 ff ff ff e8 c0 64 b9 ff e9 2d ff ff ff 41 89 de e9 13 ff ff ff 41 be fc ff ff ff e9 1a ff ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 83 ec 10 48 89 2c 24
All code
========
0: 65 ff 0d ec 71 ba 7e decl %gs:0x7eba71ec(%rip) # 0x7eba71f3
7: 0f 85 37 ff ff ff jne 0xffffffffffffff44
d: e8 c0 64 b9 ff call 0xffffffffffb964d2
12: e9 2d ff ff ff jmp 0xffffffffffffff44
17: 41 89 de mov %ebx,%r14d
1a: e9 13 ff ff ff jmp 0xffffffffffffff32
1f: 41 be fc ff ff ff mov $0xfffffffc,%r14d
25: e9 1a ff ff ff jmp 0xffffffffffffff44
2a:* 0f 0b ud2 <-- trapping instruction
2c: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
38: 48 83 ec 10 sub $0x10,%rsp
3c: 48 89 2c 24 mov %rbp,(%rsp)
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
e: 48 83 ec 10 sub $0x10,%rsp
12: 48 89 2c 24 mov %rbp,(%rsp)
[ 3040.584870] RSP: 0018:ffff8886ba0d7a98 EFLAGS: 00010202
[ 3040.584875] RAX: ffff888016588000 RBX: 0000000000000000 RCX: 0000000000000002
[ 3040.584879] RDX: 00000000000001fb RSI: ffff888016588000 RDI: 0000000000000000
[ 3040.584882] RBP: ffff888467fe0000 R08: ffff888016588500 R09: 0000000000000001
[ 3040.584886] R10: 0000000000000001 R11: ffff888443728aca R12: ffffffffffffffff
[ 3040.584889] R13: ffff8886ba0d7d18 R14: ffff888016588488 R15: 0000000000000001
[ 3040.584893] FS: 0000000000000000(0000) GS:ffff88885ef40000(0000) knlGS:0000000000000000
[ 3040.584897] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3040.584901] CR2: 00007fbcf8ca9000 CR3: 0000000159810000 CR4: 00000000001506e0
[ 3040.584904] Call Trace:
[ 3040.584908] ? bch2_migrate_index_update (fs/bcachefs/move.c:89)
[ 3040.584915] ? bch2_trans_unlink_iters (fs/bcachefs/btree_iter.c:2142 fs/bcachefs/btree_iter.c:2201 fs/bcachefs/btree_iter.c:2513)
[ 3040.584920] ? __bch2_trans_commit (fs/bcachefs/btree_update_leaf.c:969)
[ 3040.584926] bch2_trans_begin (fs/bcachefs/btree_iter.c:2560)
[ 3040.584931] bch2_migrate_index_update (fs/bcachefs/move.c:89)
[ 3040.584941] ? bch2_migrate_index_update (fs/bcachefs/keylist.h:51 fs/bcachefs/move.c:74)
[ 3040.584948] __bch2_write_index (fs/bcachefs/io.c:610)
[ 3040.584952] bch2_write_index (fs/bcachefs/io.c:641)
[ 3040.584956] process_one_work (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2281)
[ 3040.584963] worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2423)
[ 3040.584968] ? rescuer_thread (kernel/workqueue.c:2365)
[ 3040.584972] kthread (kernel/kthread.c:319)
[ 3040.584977] ? set_kthread_struct (kernel/kthread.c:272)
[ 3040.584982] ret_from_fork (arch/x86/entry/entry_64.S:300)
[ 3040.584989] Modules linked in: btrfs blake2b_generic xfs i2c_dev f2fs amdgpu uas input_leds intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd kvm iwlmvm igb btusb btintel irqbypass bluetooth iwlwifi usb_storage drm_ttm_helper ptp sp5100_tco ghash_clmulni_intel aacraid ttm gigabyte_wmi rapl pps_core wmi_bmof hid_generic i2c_piix4 backlight gpu_sched tpm_crb tpm_tis tpm_tis_core pinctrl_amd efivarfs
[ 3040.585032] ---[ end trace eb7a330d893ae5a7 ]---
[ 3040.586433] ------------[ cut here ]------------
[ 3040.586436] WARNING: CPU: 29 PID: 11825 at kernel/rcu/tree_plugin.h:359 rcu_note_context_switch (kernel/rcu/tree_plugin.h:359)
[ 3040.586445] Modules linked in: btrfs blake2b_generic xfs i2c_dev f2fs amdgpu uas input_leds intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd kvm iwlmvm igb btusb btintel irqbypass bluetooth iwlwifi usb_storage drm_ttm_helper ptp sp5100_tco ghash_clmulni_intel aacraid ttm gigabyte_wmi rapl pps_core wmi_bmof hid_generic i2c_piix4 backlight gpu_sched tpm_crb tpm_tis tpm_tis_core pinctrl_amd efivarfs
[ 3040.590106] CPU: 29 PID: 11825 Comm: kworker/29:2 Tainted: G D 5.13.7debug+ #17
[ 3040.590112] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[ 3040.590116] Workqueue: bcachefs bch2_write_index
[ 3040.590120] RIP: 0010:rcu_note_context_switch (kernel/rcu/tree_plugin.h:359)
[ 3040.590125] Code: 25 00 27 01 00 48 03 1c c5 40 a6 40 82 0f 1f 44 00 00 40 84 ed 75 15 65 48 8b 04 25 00 27 01 00 8b 90 cc 02 00 00 85 d2 7e 02 <0f> 0b 65 48 8b 04 25 00 27 01 00 8b 80 cc 02 00 00 85 c0 7e 0f 41
All code
========
0: 25 00 27 01 00 and $0x12700,%eax
5: 48 03 1c c5 40 a6 40 add -0x7dbf59c0(,%rax,8),%rbx
c: 82
d: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
12: 40 84 ed test %bpl,%bpl
15: 75 15 jne 0x2c
17: 65 48 8b 04 25 00 27 mov %gs:0x12700,%rax
1e: 01 00
20: 8b 90 cc 02 00 00 mov 0x2cc(%rax),%edx
26: 85 d2 test %edx,%edx
28: 7e 02 jle 0x2c
2a:* 0f 0b ud2 <-- trapping instruction
2c: 65 48 8b 04 25 00 27 mov %gs:0x12700,%rax
33: 01 00
35: 8b 80 cc 02 00 00 mov 0x2cc(%rax),%eax
3b: 85 c0 test %eax,%eax
3d: 7e 0f jle 0x4e
3f: 41 rex.B
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 65 48 8b 04 25 00 27 mov %gs:0x12700,%rax
9: 01 00
b: 8b 80 cc 02 00 00 mov 0x2cc(%rax),%eax
11: 85 c0 test %eax,%eax
13: 7e 0f jle 0x24
15: 41 rex.B
[ 3040.590132] RSP: 0018:ffff8886ba0d7548 EFLAGS: 00010002
[ 3040.590136] RAX: ffff88811bea2a00 RBX: ffff88885ef64680 RCX: 0000000000000000
[ 3040.590139] RDX: 0000000000000001 RSI: ffffffff8235db4a RDI: ffffffff8235df7f
[ 3040.590142] RBP: 0000000000000000 R08: ffff88885ef63a80 R09: 0000000000000000
[ 3040.590146] R10: 0000000000000009 R11: 0000000000000010 R12: ffff88811bea2a00
[ 3040.590149] R13: ffff88811bea2a00 R14: 0000000000000007 R15: ffff88885ef63a80
[ 3040.590153] FS: 0000000000000000(0000) GS:ffff88885ef40000(0000) knlGS:0000000000000000
[ 3040.590157] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3040.590160] CR2: 00007fbcf8ca9000 CR3: 0000000159810000 CR4: 00000000001506e0
[ 3040.590164] Call Trace:
[ 3040.590167] __schedule (kernel/sched/sched.h:1334 kernel/sched/core.c:5065)
[ 3040.590173] schedule (kernel/sched/core.c:5231 (discriminator 1))
[ 3040.590176] schedule_timeout (kernel/time/timer.c:1870)
[ 3040.590181] ? preempt_count_add (./include/linux/ftrace.h:844 kernel/sched/core.c:4784 kernel/sched/core.c:4781 kernel/sched/core.c:4809)
[ 3040.590185] wait_for_completion (kernel/sched/completion.c:86 kernel/sched/completion.c:106 kernel/sched/completion.c:117 kernel/sched/completion.c:138)
[ 3040.590190] virt_efi_query_variable_info (drivers/firmware/efi/runtime-wrappers.c:374 (discriminator 7))
[ 3040.590197] efi_query_variable_store.part.0 (arch/x86/platform/efi/quirks.c:165)
[ 3040.590203] ? _raw_spin_lock_irqsave (./arch/x86/include/asm/atomic.h:202 ./include/asm-generic/atomic-instrumented.h:707 ./include/asm-generic/qspinlock.h:82 ./include/linux/spinlock.h:195 ./include/linux/spinlock_api_smp.h:119 kernel/locking/spinlock.c:159)
[ 3040.590208] efivar_entry_set_safe (drivers/firmware/efi/vars.c:776)
[ 3040.590213] efi_pstore_write (drivers/firmware/efi/efi-pstore.c:271 (discriminator 6))
[ 3040.590220] pstore_dump (fs/pstore/platform.c:462)
[ 3040.590227] kmsg_dump (kernel/printk/printk.c:3345 (discriminator 10))
[ 3040.590232] oops_end (arch/x86/kernel/dumpstack.c:372)
[ 3040.590237] do_trap (arch/x86/kernel/traps.c:120 arch/x86/kernel/traps.c:161)
[ 3040.590241] ? __btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1))
[ 3040.590246] do_error_trap (arch/x86/kernel/traps.c:84 arch/x86/kernel/traps.c:182)
[ 3040.590249] ? __btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1))
[ 3040.590254] exc_invalid_op (arch/x86/kernel/traps.c:267)
[ 3040.590259] ? __btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1))
[ 3040.590264] asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:566)
[ 3040.590268] RIP: 0010:__btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1))
[ 3040.590273] Code: 65 ff 0d ec 71 ba 7e 0f 85 37 ff ff ff e8 c0 64 b9 ff e9 2d ff ff ff 41 89 de e9 13 ff ff ff 41 be fc ff ff ff e9 1a ff ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 83 ec 10 48 89 2c 24
All code
========
0: 65 ff 0d ec 71 ba 7e decl %gs:0x7eba71ec(%rip) # 0x7eba71f3
7: 0f 85 37 ff ff ff jne 0xffffffffffffff44
d: e8 c0 64 b9 ff call 0xffffffffffb964d2
12: e9 2d ff ff ff jmp 0xffffffffffffff44
17: 41 89 de mov %ebx,%r14d
1a: e9 13 ff ff ff jmp 0xffffffffffffff32
1f: 41 be fc ff ff ff mov $0xfffffffc,%r14d
25: e9 1a ff ff ff jmp 0xffffffffffffff44
2a:* 0f 0b ud2 <-- trapping instruction
2c: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
38: 48 83 ec 10 sub $0x10,%rsp
3c: 48 89 2c 24 mov %rbp,(%rsp)
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
e: 48 83 ec 10 sub $0x10,%rsp
12: 48 89 2c 24 mov %rbp,(%rsp)
[ 3040.590280] RSP: 0018:ffff8886ba0d7a98 EFLAGS: 00010202
[ 3040.590284] RAX: ffff888016588000 RBX: 0000000000000000 RCX: 0000000000000002
[ 3040.590287] RDX: 00000000000001fb RSI: ffff888016588000 RDI: 0000000000000000
[ 3040.590290] RBP: ffff888467fe0000 R08: ffff888016588500 R09: 0000000000000001
[ 3040.590293] R10: 0000000000000001 R11: ffff888443728aca R12: ffffffffffffffff
[ 3040.590296] R13: ffff8886ba0d7d18 R14: ffff888016588488 R15: 0000000000000001
[ 3040.590301] ? __btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1325)
[ 3040.590306] ? bch2_migrate_index_update (fs/bcachefs/move.c:89)
[ 3040.590311] ? bch2_trans_unlink_iters (fs/bcachefs/btree_iter.c:2142 fs/bcachefs/btree_iter.c:2201 fs/bcachefs/btree_iter.c:2513)
[ 3040.590315] ? __bch2_trans_commit (fs/bcachefs/btree_update_leaf.c:969)
[ 3040.590319] bch2_trans_begin (fs/bcachefs/btree_iter.c:2560)
[ 3040.590324] bch2_migrate_index_update (fs/bcachefs/move.c:89)
[ 3040.590333] ? bch2_migrate_index_update (fs/bcachefs/keylist.h:51 fs/bcachefs/move.c:74)
[ 3040.590340] __bch2_write_index (fs/bcachefs/io.c:610)
[ 3040.590344] bch2_write_index (fs/bcachefs/io.c:641)
[ 3040.590348] process_one_work (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2281)
[ 3040.590353] worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2423)
[ 3040.590357] ? rescuer_thread (kernel/workqueue.c:2365)
[ 3040.590362] kthread (kernel/kthread.c:319)
[ 3040.590366] ? set_kthread_struct (kernel/kthread.c:272)
[ 3040.590370] ret_from_fork (arch/x86/entry/entry_64.S:300)
[ 3040.590375] ---[ end trace eb7a330d893ae5a8 ]---
[ 3041.950494] RIP: 0010:__btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1))
[ 3041.950509] Code: 65 ff 0d ec 71 ba 7e 0f 85 37 ff ff ff e8 c0 64 b9 ff e9 2d ff ff ff 41 89 de e9 13 ff ff ff 41 be fc ff ff ff e9 1a ff ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 83 ec 10 48 89 2c 24
Not sure if that's the same problem: kernel BUG at fs/bcachefs/btree_iter.c:678!
Full dmesg: https://gist.github.com/derlaft/07c7a6a2ad9bfa19c049d4cea2088cb5 fline2addr's included (I hope that's what was needed)
Basically how I got there:
find /mount -type f -exec cp '{}' /dev/null \;
overnight - it hanged againIt mounts with ro,nochanges,norecovery
; reboot does not help anymore
fixed by commit fd3fdd74af3d682b780eae5b087062f91780f8ed
I haven't been able to find a pattern: