koverstreet / bcachefs

Other
688 stars 72 forks source link

kernel BUG at btree_iter_traverse_one() making files unreadable #294

Closed jpsollie closed 3 years ago

jpsollie commented 3 years ago

I haven't been able to find a pattern:

jpsollie commented 3 years ago

it's getting weirder every time: when mounted with -o fsck,norecovery,nochanges. dmesg:

[  639.667129] bcachefs: bch2_fs_open()
[  639.667133] bcachefs: bch2_read_super()
[  639.670273] bcachefs: bch2_read_super() ret 0
[  639.671768] bcachefs: bch2_read_super()
[  639.675408] bcachefs: bch2_read_super() ret 0
[  639.675512] bcachefs: bch2_read_super()
[  639.679179] bcachefs: bch2_read_super() ret 0
[  639.679283] bcachefs: bch2_read_super()
[  639.681907] bcachefs: bch2_read_super() ret 0
[  639.682009] bcachefs: bch2_read_super()
[  639.682450] bcachefs: bch2_read_super() ret 0
[  639.682551] bcachefs: bch2_read_super()
[  639.682879] bcachefs: bch2_read_super() ret 0
[  639.683001] bcachefs: bch2_read_super()
[  639.722438] bcachefs: bch2_read_super() ret 0
[  639.722686] bcachefs: bch2_read_super()
[  639.723114] bcachefs: bch2_read_super() ret 0
[  639.723215] bcachefs: bch2_fs_alloc()
[  639.768369] bcachefs: bch2_fs_journal_init()
[  639.768379] bcachefs: bch2_fs_journal_init() ret 0
[  639.768422] bcachefs: bch2_fs_btree_cache_init()
[  639.768472] bcachefs: bch2_fs_btree_cache_init() ret 0
[  639.768533] bcachefs: bch2_fs_encryption_init()
[  639.768548] bcachefs: bch2_fs_encryption_init() ret 0
[  639.768551] bcachefs: __bch2_fs_compress_init()
[  639.768553] bcachefs: __bch2_fs_compress_init() ret 0
[  639.768555] bcachefs: bch2_fs_fsio_init()
[  639.768562] bcachefs: bch2_fs_fsio_init() ret 0
[  639.768563] bcachefs: bch2_dev_alloc()
[  639.795685] bcachefs: bch2_dev_alloc() ret 0
[  639.795689] bcachefs: bch2_dev_alloc()
[  639.796157] bcachefs: bch2_dev_alloc() ret 0
[  639.796159] bcachefs: bch2_dev_alloc()
[  639.796609] bcachefs: bch2_dev_alloc() ret 0
[  639.796611] bcachefs: bch2_dev_alloc()
[  639.797065] bcachefs: bch2_dev_alloc() ret 0
[  639.797069] bcachefs: bch2_dev_alloc()
[  639.797531] bcachefs: bch2_dev_alloc() ret 0                                                                                                                [  639.797533] bcachefs: bch2_dev_alloc()
[  639.797976] bcachefs: bch2_dev_alloc() ret 0
[  639.797977] bcachefs: bch2_dev_alloc()
[  639.798875] bcachefs: bch2_dev_alloc() ret 0                                                                                                                [  639.798877] bcachefs: bch2_dev_alloc()
[  639.799322] bcachefs: bch2_dev_alloc() ret 0
[  639.799547] bcachefs: bch2_fs_alloc() ret 0
[  639.799671] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): recovering from clean shutdown, journal seq 69371
[  670.601621] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): journal read done, 0 keys in 1 entries, seq 69372
[  670.678134] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): starting alloc read
[  670.843638] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): alloc read done
[  670.843642] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): starting stripes_read
[  670.843644] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): stripes_read done
[  670.843645] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): starting mark and sweep
[  671.054528] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): mark and sweep done                                                                            [  671.054535] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): ret 0
[  671.054583] bcachefs (7535597e-5660-4926-a774-87a7725d4de5): mounted with opts: metadata_replicas=2,metadata_target=SSDM,foreground_target=SSDC,background_target=HDD,promote_target=SSDC,noinodes_32bit,noinodes_use_key_cache,verbose,fsck,fix_errors,nochanges,norecovery

the original file where the script crashed is ok, but no idea where it starts failing:

linuxserver ~ # cat /var/log/secstorbackup.log
2021/08/04 08:39:54 [3711] building file list
2021/08/04 08:39:54 [3711] .d..t...... ellen/
2021/08/04 08:39:54 [3711] >f+++++++++ ellen/20201101_150257.jpg
linuxserver ~ # cp /data/securestorage/ellen/20201101_150257.jpg /mnt/satabackup/ellen/
linuxserver ~ # cp --preserve=all /data/securestorage/ellen/20201101_150257.jpg /mnt/satabackup/ellen/
linuxserver ~ # rsync -I -a /data/securestorage/ellen/20201101_150257.jpg /mnt/satabackup/ellen/
linuxserver ~ # sh /etc/cron.monthly/backups.sh
/etc/cron.monthly/backups.sh: line 36:  3829 Segmentation fault      nice rsync -a /data/securestorage/ /mnt/satabackup/ --log-file=/var/log/secstorbackup.log > /dev/null 2>&1
linuxserver ~ # cat /var/log/secstorbackup.log
2021/08/04 09:01:51 [3829] building file list                                                                                                                  2021/08/04 09:01:52 [3829] .d..t...... ellen/
linuxserver ~ #

Any clue here?

jpsollie commented 3 years ago

new story: when the bug occurs, and the filesystem is mounted nochanges,norecovery, umount -l -f also segfaults at the same function, but the stack trace is different, so it may provide more info:

[17265.410467] ------------[ cut here ]------------
[17265.410475] Kernel BUG at journal_entry_open+0x22f/0x260 [bcachefs] [verbose debug info unavailable]
[17265.410506] invalid opcode: 0000 [#2] SMP
[17265.410511] CPU: 16 PID: 6818 Comm: umount Tainted: G      D W         5.13.7release+ #8
[17265.410516] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[17265.410519] RIP: 0010:journal_entry_open+0x22f/0x260 [bcachefs]
[17265.410543] Code: 00 48 8d bb 90 12 00 00 e8 2e a7 8f e0 f0 83 44 24 fc 00 48 8d bb 98 12 00 00 e8 1c a7 8f e0 5b 45 31 c0 44 89 c0 5d 41 5c c3 <0f> 0b 0f 0b 4c 8b a3 20 17 00 00 65 8b 3d 0f c1 4a 5f e8 ba b5 57                                                                                                        [17265.410550] RSP: 0018:ffff88847f9a3d50 EFLAGS: 00010202
[17265.410554] RAX: ffff888100904000 RBX: ffff88847f9a3e18 RCX: 0000000000010efc
[17265.410558] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff888403a137b0
[17265.410561] RBP: ffff888403a137b0 R08: ffff88811947aa70 R09: 00000000000ffffe                                                                               [17265.410564] R10: 003fffffffffffff R11: 0000000000000000 R12: ffc00fffffffffff
[17265.410566] R13: fffffffc00ffffff R14: 0000000000000000 R15: 0000000000000000
[17265.410569] FS:  00007f6ff81df740(0000) GS:ffff88885ec00000(0000) knlGS:0000000000000000
[17265.410573] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17265.410576] CR2: 00007f9fc95990b0 CR3: 0000000409d0d000 CR4: 00000000001506e0
[17265.410579] Call Trace:
[17265.410582]  __journal_res_get+0x38d/0x5f0 [bcachefs]
[17265.410606]  ? list_lru_del+0x54/0xa0
[17265.410611]  ? path_check_mount+0x40/0x40
[17265.410616]  bch2_journal_res_get_slowpath+0x17/0xf0 [bcachefs]
[17265.410638]  bch2_journal_flush_seq_async.part.0+0x2cd/0x470 [bcachefs]
[17265.410660]  bch2_sync_fs+0x27/0x30 [bcachefs]                                                                                                              [17265.410682]  sync_filesystem+0x38/0x80
[17265.410687]  generic_shutdown_super+0x1d/0xf0
[17265.410692]  bch2_kill_sb+0xd/0x20 [bcachefs]
[17265.410714]  deactivate_locked_super+0x2c/0x80
[17265.410718]  cleanup_mnt+0x12c/0x180
[17265.410721]  task_work_run+0x54/0x80
[17265.410727]  exit_to_user_mode_loop+0xc5/0xd0
[17265.410733]  exit_to_user_mode_prepare+0x85/0x90
[17265.410738]  syscall_exit_to_user_mode+0x1d/0x40
[17265.410743]  do_syscall_64+0x47/0x70
[17265.410747]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[17265.410751] RIP: 0033:0x7f6ff8310277                                                                                                                        [17265.410754] Code: ac 08 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 c9 ab 08 00 f7 d8 64 89 02 b8
[17265.410760] RSP: 002b:00007ffd99d1f938 EFLAGS: 00000202 ORIG_RAX: 00000000000000a6
[17265.410764] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 00007f6ff8310277
[17265.410767] RDX: 0000000000000014 RSI: 0000000000000003 RDI: 00005578356d9200
[17265.410770] RBP: 00005578356d2620 R08: 00005578356d8fa0 R09: 0000000000000000
[17265.410772] R10: 0000000000000000 R11: 0000000000000202 R12: 00005578356d9200
[17265.410774] R13: 0000000000000003 R14: 00005578356d2510 R15: 0000000000000003
[17265.410778] Modules linked in: btrfs blake2b_generic xfs bcachefs xor raid6_pq i2c_dev f2fs uas input_leds intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd kvm irqbypass ghash_clmulni_intel amdgpu iwlmvm rapl gigabyte_wmi wmi_bmof usb_storage drm_ttm_helper igb aacraid btusb ttm iwlwifi sp5100_tco btintel backlight ptp gpu_sched pps_core i2c_piix4 bluetooth hid_generic tpm_crb tpm_tis tpm_tis_core pinctrl_amd efivarfs
[17265.410818] ---[ end trace b215b693d50bb831 ]---
[17266.151650] RIP: 0010:btree_iter_traverse_one+0x8b3/0x8f0 [bcachefs]
[17266.151677] Code: d3 49 5d e0 e9 4e ff ff ff 41 0f b6 55 5a e9 58 f8 ff ff 0f 0b 66 90 e8 ab 52 00 00 83 f8 fc 41 89 c6 0f 94 c2 e9 24 fb ff ff <0f> 0b 89 c8 41 0f b6 55 5a 83 e0 0f e9 db f7 ff ff 48 63 44 24 24
[17266.151685] RSP: 0018:ffff88840a0837c8 EFLAGS: 00010202
[17266.151688] RAX: 0000000000000001 RBX: ffff88840a0d8000 RCX: 0000000000000000
[17266.151691] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88840a0d8000
[17266.151693] RBP: 0000000000000001 R08: 00000000ffffffff R09: ffff888474026c00
[17266.151695] R10: 0000000000010c00 R11: ffff88810b2e0000 R12: 0000000000000000
[17266.151698] R13: ffff88840a0d8000 R14: 0000000000000000 R15: 0000000000000103
[17266.151700] FS:  00007f6ff81df740(0000) GS:ffff88885ec00000(0000) knlGS:0000000000000000
[17266.151702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17266.151705] CR2: 00007f9fc95990b0 CR3: 0000000409d0d000 CR4: 00000000001506e0
jpsollie commented 3 years ago

In order to be as complete as possible, I collected some debug info (metadata, fsck output, journal), if anybody would like to take a look at it, be my guest https://we.tl/t-kuXwG8s7Gq

jpsollie commented 3 years ago

@koverstreet requested a faddr2line, but due to reduced debug info, this was unavailable. So I compiled a debug kernel (bcachefs_debug=y, loadallksyms, verbose bug reporting) and regenerated the info, but the stack was omewhat different. So instead of faddr2line, I piped dmesg through decode_stacktrace.sh

2470.526556] ------------[ cut here ]------------
[ 2470.526560] kernel BUG at fs/bcachefs/btree_iter.c:1608!
[ 2470.526565] invalid opcode: 0000 [#1] SMP
[ 2470.526572] CPU: 18 PID: 3852 Comm: rsync Not tainted 5.13.7debug+ #11
[ 2470.526577] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[ 2470.526581] RIP: 0010:btree_iter_set_search_pos (fs/bcachefs/btree_iter.c:1608 (discriminator 1)) bcachefs
[ 2470.526608] Code: ff 89 d9 d3 c7 21 fa 21 f8 66 0f be d2 66 98 c1 e2 08 c1 e0 04 09 d0 41 0f b7 56 5a 66 81 e2 0f f0 09 d0 66 41 89 46 5a eb 0a <0f> 0b 41 0f b6 76 59 83 e6 0f 41 0f b6 46 58 89 c2 c0 ea 04 83 e2
All code
========
   0:   ff 89 d9 d3 c7 21       decl   0x21c7d3d9(%rcx)
   6:   fa                      cli
   7:   21 f8                   and    %edi,%eax
   9:   66 0f be d2             movsbw %dl,%dx
   d:   66 98                   cbtw
   f:   c1 e2 08                shl    $0x8,%edx
  12:   c1 e0 04                shl    $0x4,%eax                                                                                                                 15:   09 d0                   or     %edx,%eax
  17:   41 0f b7 56 5a          movzwl 0x5a(%r14),%edx
  1c:   66 81 e2 0f f0          and    $0xf00f,%dx
  21:   09 d0                   or     %edx,%eax
  23:   66 41 89 46 5a          mov    %ax,0x5a(%r14)
  28:   eb 0a                   jmp    0x34
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   41 0f b6 76 59          movzbl 0x59(%r14),%esi
  31:   83 e6 0f                and    $0xf,%esi
  34:   41 0f b6 46 58          movzbl 0x58(%r14),%eax
  39:   89 c2                   mov    %eax,%edx
  3b:   c0 ea 04                shr    $0x4,%dl
  3e:   83                      .byte 0x83
  3f:   e2                      .byte 0xe2

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   41 0f b6 76 59          movzbl 0x59(%r14),%esi
   7:   83 e6 0f                and    $0xf,%esi
   a:   41 0f b6 46 58          movzbl 0x58(%r14),%eax
   f:   89 c2                   mov    %eax,%edx
  11:   c0 ea 04                shr    $0x4,%dl
  14:   83                      .byte 0x83
  15:   e2                      .byte 0xe2
[ 2470.526614] RSP: 0018:ffff888462f7f830 EFLAGS: 00010202
[ 2470.526618] RAX: 0000000000000008 RBX: ffff888482188000 RCX: 0000000000000184
[ 2470.526621] RDX: 0000000000000000 RSI: 0000000000000183 RDI: ffff888462f7fa90
[ 2470.526624] RBP: 0000000000000001 R08: 0000000000000183 R09: 0000000000000001
[ 2470.526626] R10: 0000000000000c00 R11: 0000000000000001 R12: ffff888136949880
[ 2470.526628] R13: 0000000000000000 R14: ffff888482188000 R15: 0000000000000184
[ 2470.526631] FS:  00007f5cda0a4740(0000) GS:ffff88885ec80000(0000) knlGS:0000000000000000
[ 2470.526634] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2470.526637] CR2: 0000559217a10da8 CR3: 000000048a087000 CR4: 00000000001506e0
[ 2470.526639] Call Trace:
[ 2470.526643] bch2_btree_iter_peek_slot (fs/bcachefs/btree_iter.c:1493 fs/bcachefs/btree_iter.c:1897) bcachefs
[ 2470.526667] bchfs_read (./include/linux/err.h:59 fs/bcachefs/btree_iter.h:274 fs/bcachefs/fs-io.c:815) bcachefs
[ 2470.526694] ? bio_alloc_bioset (block/bio.c:447)
[ 2470.526700] bch2_readahead (fs/bcachefs/fs-io.c:698 (discriminator 2) fs/bcachefs/fs-io.c:892 (discriminator 2)) bcachefs
[ 2470.526726] ? bch2_readahead (fs/bcachefs/bcachefs_format.h:143 (discriminator 2)) bcachefs
[ 2470.526752] read_pages (./include/linux/pagemap.h:919 mm/readahead.c:132)
[ 2470.526757] page_cache_ra_unbounded (./arch/x86/include/asm/current.h:15 ./include/linux/sched/mm.h:263 mm/readahead.c:239)
[ 2470.526761] filemap_get_pages (mm/filemap.c:2470)
[ 2470.526765] filemap_read (mm/filemap.c:2552)
[ 2470.526768] ? __pagevec_lru_add_fn (./include/linux/mm_inline.h:88 mm/swap.c:1043)
[ 2470.526772] ? __pagevec_lru_add (./include/linux/pagevec.h:56 mm/swap.c:1066)
[ 2470.526774] ? lru_cache_add (mm/swap.c:476)
[ 2470.526777] ? do_anonymous_page (mm/memory.c:3674)
[ 2470.526780] bch2_read_iter (fs/bcachefs/fs-io.c:1799) bcachefs
[ 2470.526805] new_sync_read (fs/read_write.c:416 (discriminator 1))
[ 2470.526810] vfs_read (fs/read_write.c:496)
[ 2470.526813] ksys_read (fs/read_write.c:635)
[ 2470.526816] do_syscall_64 (arch/x86/entry/common.c:47)
[ 2470.526821] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:112)
[ 2470.526825] RIP: 0033:0x7f5cda1b0dae
[ 2470.526827] Code: c0 e9 b6 fe ff ff 48 8d 3d 3f be 06 00 50 e8 49 ce 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
All code
========
   0:   c0 e9 b6                shr    $0xb6,%cl
   3:   fe                      (bad)
   4:   ff                      (bad)
   5:   ff 48 8d                decl   -0x73(%rax)
   8:   3d 3f be 06 00          cmp    $0x6be3f,%eax
   d:   50                      push   %rax
   e:   e8 49 ce 01 00          callq  0x1ce5c
  13:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  1a:   00 00
  1c:   64 8b 04 25 18 00 00    mov    %fs:0x18,%eax
  23:   00
  24:   85 c0                   test   %eax,%eax
  26:   75 14                   jne    0x3c
  28:   0f 05                   syscall
  2a:*  48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax         <-- trapping instruction
  30:   77 5a                   ja     0x8c
  32:   c3                      retq
  33:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  3a:   00 00
  3c:   48 83 ec 28             sub    $0x28,%rsp

Code starting with the faulting instruction
===========================================
   0:   48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
   6:   77 5a                   ja     0x62
   8:   c3                      retq
   9:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  10:   00 00
  12:   48 83 ec 28             sub    $0x28,%rsp
[ 2470.526833] RSP: 002b:00007ffc0b5c5558 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2470.526836] RAX: ffffffffffffffda RBX: 0000559216684280 RCX: 00007f5cda1b0dae
[ 2470.526838] RDX: 0000000000040000 RSI: 00005592179d0da0 RDI: 0000000000000003
[ 2470.526841] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
[ 2470.526843] R10: 0000000000040030 R11: 0000000000000246 R12: 0000000000000000
[ 2470.526845] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000040000
[ 2470.526848] Modules linked in: btrfs blake2b_generic bcachefs xor raid6_pq f2fs i2c_dev uas amdgpu input_leds intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd iwlmvm kvm btusb igb btintel irqbypass iwlwifi ptp ghash_clmulni_intel bluetooth sp5100_tco gigabyte_wmi usb_storage hid_generic rapl wmi_bmof pps_core aacraid i2c_piix4 drm_ttm_helper ttm backlight gpu_sched tpm_crb tpm_tis pinctrl_amd tpm_tis_core efivarfs
[ 2470.526880] ---[ end trace 7351a3ec7e320e50 ]---
[ 2470.636814] RIP: 0010:btree_iter_set_search_pos (fs/bcachefs/btree_iter.c:1608 (discriminator 1)) bcachefs                                                  [ 2470.636840] Code: ff 89 d9 d3 c7 21 fa 21 f8 66 0f be d2 66 98 c1 e2 08 c1 e0 04 09 d0 41 0f b7 56 5a 66 81 e2 0f f0 09 d0 66 41 89 46 5a eb 0a <0f> 0b 41 0f b6 76 59 83 e6 0f 41 0f b6 46 58 89 c2 c0 ea 04 83 e2
All code
========
   0:   ff 89 d9 d3 c7 21       decl   0x21c7d3d9(%rcx)
   6:   fa                      cli
   7:   21 f8                   and    %edi,%eax
   9:   66 0f be d2             movsbw %dl,%dx
   d:   66 98                   cbtw
   f:   c1 e2 08                shl    $0x8,%edx
  12:   c1 e0 04                shl    $0x4,%eax
  15:   09 d0                   or     %edx,%eax
  17:   41 0f b7 56 5a          movzwl 0x5a(%r14),%edx
  1c:   66 81 e2 0f f0          and    $0xf00f,%dx
  21:   09 d0                   or     %edx,%eax
  23:   66 41 89 46 5a          mov    %ax,0x5a(%r14)
  28:   eb 0a                   jmp    0x34
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   41 0f b6 76 59          movzbl 0x59(%r14),%esi
  31:   83 e6 0f                and    $0xf,%esi
  34:   41 0f b6 46 58          movzbl 0x58(%r14),%eax
  39:   89 c2                   mov    %eax,%edx
  3b:   c0 ea 04                shr    $0x4,%dl
  3e:   83                      .byte 0x83
  3f:   e2                      .byte 0xe2

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   41 0f b6 76 59          movzbl 0x59(%r14),%esi
   7:   83 e6 0f                and    $0xf,%esi
   a:   41 0f b6 46 58          movzbl 0x58(%r14),%eax
   f:   89 c2                   mov    %eax,%edx
  11:   c0 ea 04                shr    $0x4,%dl
  14:   83                      .byte 0x83
  15:   e2                      .byte 0xe2
[ 2470.636846] RSP: 0018:ffff888462f7f830 EFLAGS: 00010202
[ 2470.636849] RAX: 0000000000000008 RBX: ffff888482188000 RCX: 0000000000000184
[ 2470.636852] RDX: 0000000000000000 RSI: 0000000000000183 RDI: ffff888462f7fa90
[ 2470.636855] RBP: 0000000000000001 R08: 0000000000000183 R09: 0000000000000001
[ 2470.636857] R10: 0000000000000c00 R11: 0000000000000001 R12: ffff888136949880
[ 2470.636860] R13: 0000000000000000 R14: ffff888482188000 R15: 0000000000000184
[ 2470.636862] FS:  00007f5cda0a4740(0000) GS:ffff88885ec80000(0000) knlGS:0000000000000000
[ 2470.636866] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2470.636868] CR2: 0000559217a10da8 CR3: 000000048a087000 CR4: 00000000001506e0
[ 2470.636872] ------------[ cut here ]------------
koverstreet commented 3 years ago

It was an unhandled transaction restart - we recently (in 32638053e1 bcachefs: trans->restarted) started aggressively checking whether transaction restarts were being handled correctly.

Fixed by 60fbf06f49

jpsollie commented 3 years ago

removing corrupted files doesn't work yet. Quite possible related to same issue:

[49762.145470] CR2: 000055a779f36088 CR3: 000000010bf4f000 CR4: 00000000001506e0                                                                               
[49762.145472] Call Trace:                                                                                                                                     
[49762.145474]  ? bch2_trans_unlink_iters+0xc5/0x200 [bcachefs]                                                                                                
[49762.145487]  bch2_trans_begin+0xf4/0x120 [bcachefs]                                                                                                         
[49762.145500]  bch2_btree_delete_range_trans+0x7a/0x260 [bcachefs]                                                                                            
[49762.145513]  bch2_inode_rm+0xc2/0x330 [bcachefs]                                                                                                            
[49762.145527]  ? truncate_inode_pages_range+0x19a/0x690                                                                                                       
[49762.145531]  ? bch2_inode_rm+0x7a/0x330 [bcachefs]                                                                                                          
[49762.145544]  evict+0xb7/0x160                                                                                                                               
[49762.145547]  do_unlinkat+0x1a8/0x2b0                                                                                                                        
[49762.145551]  do_syscall_64+0x3a/0x70                                                                                                                        
[49762.145555]  entry_SYSCALL_64_after_hwframe+0x44/0xae
jpsollie commented 3 years ago
[49762.145406] ------------[ cut here ]------------
[49762.145409] Kernel BUG at __btree_iter_traverse_all.constprop.0+0x22f/0x2b0 [bcachefs] [verbose debug info unavailable]
[49762.145426] invalid opcode: 0000 [#49 ]
[49762.145431] CPU: 27 PID: 5269 Comm: rm Not tainted 5.13.7release+ #13 
[49762.145435] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019 
[49762.145437] RIP:  0010:__btree_iter_traverse_all.constprop.0+0x22f/0x2b0 [bcachefs] 
[49762.145450] Code: e8 16 df fe ff 80 63 16 fb 44 89 e0 48 83 c4 30 5b 5d 41 5c 41 5d 41 5e 41 5f c3 be 01 00 
00 00 e8 c6 fc ff ff e9 8d fe ff ff <0f> 0b 80 4b 16 02 eb cb 31 c9 31 c0 89 c2 83 c0 20 48 89 0c 14 48
[49762.145454] RSP: 0018:ffff8881339e7b80 EFLAGS: 00010202
[49762.145457] RAX: ffff888785440000 RBX: ffff8881339e7dc0 RCX: 0000000000000000 
[49762.145459] RDX: 0000000000000002 RSI: 000000000000b803 RDI: ffff888785440000 
[49762.145461] RBP: ffff888408380000 R08: ffff88845f12f200 R09: 0000129500000000
[49762.145463] R10: ffff8881339e7a40 R11: 0000000000000000 R12: 0000000000000000
[49762.145464] R13: ffff888785440d98 R14: 0000000000000001 R15: ffffffffffffffff
[49762.145466] FS:  00007f6b06128600(0000) GS:ffff88885eec0000(0000)
[49762.145468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[49762.145470] CR2: 000055a779f36088 CR3: 000000010bf4f000 CR4: 00000000001506e0
[49762.145472] Call Trace:
[49762.145474]  ? bch2_trans_unlink_iters+0xc5/0x200 [bcachefs]
[49762.145487]  bch2_trans_begin+0xf4/0x120 [bcachefs]
[49762.145500]  bch2_btree_delete_range_trans+0x7a/0x260 [bcachefs]
[49762.145513]  bch2_inode_rm+0xc2/0x330 [bcachefs]
[49762.145527]  ? truncate_inode_pages_range+0x19a/0x690
[49762.145531]  ? bch2_inode_rm+0x7a/0x330 [bcachefs]
[49762.145544]  evict+0xb7/0x160
[49762.145547]  do_unlinkat+0x1a8/0x2b0
[49762.145551]  do_syscall_64+0x3a/0x70
[49762.145555]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2220.747197] bcachefs (7535597e-5660-4926-a774-87a7725d4de5 inum 39250): no device to read from
[ 2220.747200] bcachefs (7535597e-5660-4926-a774-87a7725d4de5 inum 39250): read error 3 from btree lookup
[ 2220.747949] bcachefs (sdg inum 39250 offset 28160): data checksum error: expected 0:ace7597a got 0:0 (type 5)
[ 2220.747964] bcachefs (7535597e-5660-4926-a774-87a7725d4de5 inum 39250): no device to read from
[ 2220.747966] bcachefs (7535597e-5660-4926-a774-87a7725d4de5 inum 39250): read error 3 from btree lookup
[ 2220.771907] bcachefs (sdg inum 39250 offset 28733): data checksum error: expected 0:b2a6ad98 got 0:c5a5b6b5 (type 5)
[ 2220.771924] bcachefs (7535597e-5660-4926-a774-87a7725d4de5 inum 39250): no device to read from                                                              
...
read errors mapping "/data/securestorage/ellen/20210730_210543.jpg": Input/output error (5)

this is no issue to me, except that removing the corrupted file segfaults ...

koverstreet commented 3 years ago

You still didn't compile your kernel with full debug info

jpsollie commented 3 years ago

I have 2 kernels: a "release" 5.13.7release+ and "debug" kernel 5.13.7debug+. The 2nd one is booted in case it's needed. The faddr2line never worked. Not for release and not for debug

jpsollie commented 3 years ago

regenerated the error: to verify file integrity of every file, I executed "find -type f | while read r; do cp "$r" /dev/null. Unfortunately, the error reappears, but without segfault this time:

[7055.750076] ------------[ cut here ]------------ 
[ 7055.750080] kernel BUG at fs/bcachefs/btree_iter.c:1332!
[ 7055.750086] invalid opcode: 0000 [#1] SMP
[ 7055.750095] CPU: 23 PID: 24846 Comm: kworker/u64:2 Not tainted 5.13.7debug+ #12
[ 7055.750101] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[ 7055.750106] Workqueue: events_unbound __bch2_read_endio [bcachefs]
[ 7055.750145] RIP: 0010:__btree_iter_traverse_all.constprop.0 (/usr/src/bcachefs/fs/bcachefs/btree_iter.c:1332) bcachefs 
[ 7055.750176] Code: 89 c3 8b 44 24 28 25 ff ff ff 03 ff c8 74 08 48 89 e7 e8 05 12 d4 e0 85 db 75 da 90 e9 53 fe ff ff 0f 0b b8 fc ff ff ff eb 84 <0f> 0b 48 83 ec 10 48 89 2c 24 4c 89 64 24 08 48 89 fd 65 48 8b 04                                                                                                        All code                                                                                                                                                       ========                                                                                                                                                          0:   89 c3                   mov    %eax,%ebx                                                                                                                  2:   8b 44 24 28             mov    0x28(%rsp),%eax                                                                                                            6:   25 ff ff ff 03          and    $0x3ffffff,%eax                                                                                                            b:   ff c8                   dec    %eax                                                                                                                       d:   74 08                   je     0x17                                                                                                                       f:   48 89 e7                mov    %rsp,%rdi                                                                                                                 12:   e8 05 12 d4 e0          call   0xffffffffe0d4121c                                                                                                        17:   85 db                   test   %ebx,%ebx                                                                                                                 19:   75 da                   jne    0xfffffffffffffff5                                                                                                        1b:   90                      nop                                                                                                                              1c:   e9 53 fe ff ff          jmp    0xfffffffffffffe74                                                                                                        21:   0f 0b                   ud2                                                                                                                              23:   b8 fc ff ff ff          mov    $0xfffffffc,%eax                                                                                                          28:   eb 84                   jmp    0xffffffffffffffae                                                                                                        2a:*  0f 0b                   ud2             <-- trapping instruction                                                                                         2c:   48 83 ec 10             sub    $0x10,%rsp                                                                                                                30:   48 89 2c 24             mov    %rbp,(%rsp)                                                                                                               34:   4c 89 64 24 08          mov    %r12,0x8(%rsp)                                                                                                            39:   48 89 fd                mov    %rdi,%rbp                                                                                                                 3c:   65                      gs                                                                                                                               3d:   48                      rex.W                                                                                                                            3e:   8b                      .byte 0x8b                                                                                                                       3f:   04                      .byte 0x4                                                                                                                                                                                                                                                                                     Code starting with the faulting instruction                                                                                                                    ===========================================                                                                                                                       0:   0f 0b                   ud2                                                                                                                               2:   48 83 ec 10             sub    $0x10,%rsp                                                                                                                 6:   48 89 2c 24             mov    %rbp,(%rsp)                                                                                                                a:   4c 89 64 24 08          mov    %r12,0x8(%rsp)                                                                                                             f:   48 89 fd                mov    %rdi,%rbp                                                                                                                 12:   65                      gs                                                                                                                               13:   48                      rex.W                                                                                                                            14:   8b                      .byte 0x8b                                                                                                                       15:   04                      .byte 0x4 
[ 7055.750183] RSP: 0018:ffff8884c5a6fc10 EFLAGS: 00010202 
[ 7055.750187] RAX: ffff888498f601d0 RBX: 0000000000000000 RCX: 0000000000000002
[ 7055.750191] RDX: 00000000000001bc RSI: ffff888498f60000 RDI: 0000000000000002
[ 7055.750194] RBP: ffff88841fe00000 R08: ffff888498f606d0 R09: 0000000000000001
[ 7055.750197] R10: 0000000000000001 R11: ffff888477c218ca R12: ffff8884c5a6fc88 
[ 7055.750200] R13: ffff888498f60658 R14: 0000000000000001 R15: ffffffffffffffff 
[ 7055.750203] FS:  0000000000000000(0000) GS:ffff88885edc0000(0000) knlGS:0000000000000000
[ 7055.750208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[ 7055.750211] CR2: 00007f192322c850 CR3: 00000004c6370000 CR4: 00000000001506e0 
[ 7055.750215] Call Trace: 
[ 7055.750219] ? bch2_trans_iter_put (/usr/src/bcachefs/fs/bcachefs/btree_iter.c:2226 /usr/src/bcachefs/fs/bcachefs/btree_iter.c:2209) bcachefs 
[ 7055.750248] ? bch2_trans_unlink_iters (/usr/src/bcachefs/fs/bcachefs/btree_iter.c:2142 /usr/src/bcachefs/fs/bcachefs/btree_iter.c:2201 /usr/src/bcachefs/fs/bcachefs/btree_iter.c:2513) bcachefs
[ 7055.750276] ? __bch2_trans_commit (/usr/src/bcachefs/./include/linux/percpu-refcount.h:325 /usr/src/bcachefs/./include/linux/percpu-refcount.h:338 /usr/src/bcachefs/fs/bcachefs/btree_update_leaf.c:961) bcachefs
[ 7055.750306] bch2_trans_begin (/usr/src/bcachefs/fs/bcachefs/btree_iter.c:2560) bcachefs
[ 7055.750334] bch2_rbio_narrow_crcs (/usr/src/bcachefs/fs/bcachefs/io.c:1801 (discriminator 6)) bcachefs
[ 7055.750367] ? bch2_rbio_narrow_crcs (/usr/src/bcachefs/fs/bcachefs/io.c:1801) bcachefs
[ 7055.750398] __bch2_read_endio (/usr/src/bcachefs/fs/bcachefs/io.c:1843) bcachefs 
[ 7055.750430] process_one_work (workqueue.c:?) 
[ 7055.750437] worker_thread (workqueue.c:?)
[ 7055.750441] ? process_one_work (workqueue.c:?) 
[ 7055.750445] kthread (kthread.c:?) 
[ 7055.750449] ? set_kthread_struct (kthread.c:?)
[ 7055.750453] ret_from_fork (??:?) 
[ 7055.750458] Modules linked in: btrfs blake2b_generic bcachefs xor raid6_pq i2c_dev f2fs amdgpu uas input_leds intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd kvm iwlmvm irqbypass igb btusb btintel ghash_clmulni_intel bluetooth iwlwifi usb_storage rapl ptp sp5100_tco gigabyte_wmi wmi_bmof hid_generic aacraid drm_ttm_helper pps_core i2c_piix4 ttm backlight gpu_sched tpm_crb tpm_tis pinctrl_amd tpm_tis_core efivarfs 
[ 7055.750496] ---[ end trace ee44cc42b3f20b76 ]---
[ 7055.832403] RIP: 0010:__btree_iter_traverse_all.constprop.0 (/usr/src/bcachefs/fs/bcachefs/btree_iter.c:1332) bcachefs
[ 7055.832440] Code: 89 c3 8b 44 24 28 25 ff ff ff 03 ff c8 74 08 48 89 e7 e8 05 12 d4 e0 85 db 75 da 90 e9 53 fe ff ff 0f 0b b8 fc ff ff ff eb 84 <0f> 0b 48 83 ec 10 48 89 2c 24 4c 89 64 24 08 48 89 fd 65 48 8b 04                                                                                                        All code                                                                                                                                                       ========                                                                                                                                                          0:   89 c3                   mov    %eax,%ebx                                                                                                                  2:   8b 44 24 28             mov    0x28(%rsp),%eax                                                                                                            6:   25 ff ff ff 03          and    $0x3ffffff,%eax                                                                                                            b:   ff c8                   dec    %eax                                                                                                                       d:   74 08                   je     0x17                                                                                                                       f:   48 89 e7                mov    %rsp,%rdi                                                                                                                 12:   e8 05 12 d4 e0          call   0xffffffffe0d4121c                                                                                                        17:   85 db                   test   %ebx,%ebx                                                                                                                 19:   75 da                   jne    0xfffffffffffffff5                                                                                                        1b:   90                      nop                                                                                                                              1c:   e9 53 fe ff ff          jmp    0xfffffffffffffe74                                                                                                        21:   0f 0b                   ud2                                                                                                                              23:   b8 fc ff ff ff          mov    $0xfffffffc,%eax                                                                                                          28:   eb 84                   jmp    0xffffffffffffffae                                                                                                        2a:*  0f 0b                   ud2             <-- trapping instruction                                                                                         2c:   48 83 ec 10             sub    $0x10,%rsp                                                                                                                30:   48 89 2c 24             mov    %rbp,(%rsp)                                                                                                               34:   4c 89 64 24 08          mov    %r12,0x8(%rsp)                                                                                                            39:   48 89 fd                mov    %rdi,%rbp                                                                                                                 3c:   65                      gs                                                                                                                               3d:   48                      rex.W                                                                                                                            3e:   8b                      .byte 0x8b                                                                                                                       3f:   04                      .byte 0x4                                                                                                                                                                                                                                                                                     Code starting with the faulting instruction                                                                                                                    ===========================================                                                                                                                       0:   0f 0b                   ud2                                                                                                                               2:   48 83 ec 10             sub    $0x10,%rsp                                                                                                                 6:   48 89 2c 24             mov    %rbp,(%rsp)                                                                                                                a:   4c 89 64 24 08          mov    %r12,0x8(%rsp)                                                                                                             f:   48 89 fd                mov    %rdi,%rbp                                                                                                                 12:   65                      gs                                                                                                                               13:   48                      rex.W                                                                                                                            14:   8b                      .byte 0x8b                                                                                                                       15:   04                      .byte 0x4                                                                                                                      
[ 7055.832447] RSP: 0018:ffff8884c5a6fc10 EFLAGS: 00010202
[ 7055.832451] RAX: ffff888498f601d0 RBX: 0000000000000000 RCX: 0000000000000002
[ 7055.832455] RDX: 00000000000001bc RSI: ffff888498f60000 RDI: 0000000000000002
[ 7055.832458] RBP: ffff88841fe00000 R08: ffff888498f606d0 R09: 0000000000000001
[ 7055.832461] R10: 0000000000000001 R11: ffff888477c218ca R12: ffff8884c5a6fc88
[ 7055.832464] R13: ffff888498f60658 R14: 0000000000000001 R15: ffffffffffffffff
[ 7055.832467] FS:  0000000000000000(0000) GS:ffff88885edc0000(0000) knlGS:0000000000000000
[ 7055.832472] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7055.832475] CR2: 00007f192322c850 CR3: 00000004c6370000 CR4: 00000000001506e0
marcin-github commented 3 years ago

The faddr2line never worked. Not for release and not for debug

Did you compile bcachefs as module?

jpsollie commented 3 years ago

The faddr2line never worked. Not for release and not for debug

Did you compile bcachefs as module?

Yes. I consider it an advantage to be able to remove / force-remove the code if not functioning for whatever reason

koverstreet commented 3 years ago

Did you compile bcachefs as module?

Yes. I consider it an advantage to be able to remove / force-remove the code if not functioning for whatever reason

That's why faddr2line wasn't working for you. I don't see how removing the module if something isn't working would ever be useful.

But you finally got file and line number, which is what I needed.

Were there any other errors from bcachefs in your dmesg log?

jpsollie commented 3 years ago

The device also hosts dhcp / dns for my home network. I thought "if bcachefs module would somewhere cause a lockup / infinite loop, I could still unload it and schedule the restart" cp operation did not say anything. it simply hangs, in 's' state, but nothing is printed to dmesg. EDIT: dmesg previously made some annoying checksum errors on reading corrupted files with rsync, but I removed those, so not related: https://pastebin.com/FW6eaJyt

koverstreet commented 3 years ago

Module unloading doesn't work that way.

Post your full dmesg log

On Fri, Aug 6, 2021, 22:05 jpsollie @.***> wrote:

The device also hosts dhcp / dns for my home network. I thought "if bcachefs module would somewhere cause a lockup / infinite loop, I could still unload it and schedule the restart" cp operation did not say anything. it simply hangs, in 's' state, but nothing is printed to dmesg.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/koverstreet/bcachefs/issues/294#issuecomment-894590051, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPGX3UE3H4LQNQJNPBXTGTT3SIFBANCNFSM5BQKLAJQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

jpsollie commented 3 years ago

output.log

jpsollie commented 3 years ago

based on your suggestions, I recompiled the kernel with bcachefs built in, and recreated the error condition (new dmesg in attachment, I believe it looks much cleaner now). I also have the faulting filename now, doing a onetime cp after a fresh reboot fixes the issue, so it may be an issue with scheduling? dmesg_full_stacktrace_decoded.txt

marcin-github commented 3 years ago

The faddr2line never worked. Not for release and not for debug

Did you compile bcachefs as module?

Yes. I consider it an advantage to be able to remove / force-remove the code if not functioning for whatever reason

If you have bcachefs as module then invocation of fadd2line should be as below:

scripts/faddr2line fs/bcachefs/bcachefs.o ....

jpsollie commented 3 years ago

The faddr2line never worked. Not for release and not for debug

Did you compile bcachefs as module?

Yes. I consider it an advantage to be able to remove / force-remove the code if not functioning for whatever reason

If you have bcachefs as module then invocation of fadd2line should be as below:

scripts/faddr2line fs/bcachefs/bcachefs.o ....

yes, koverstreet also mentioned that, but unfortunately, that didn't work out either ^^ Anyway, I'll keep the code in-kernel from now on, no problem.

jpsollie commented 3 years ago

Trying with a preemptive kernel (to avoid some scheduling issues) did not work out either:

[ 3040.584815] ------------[ cut here ]------------
[ 3040.584820] kernel BUG at fs/bcachefs/btree_iter.c:1332!
[ 3040.584826] invalid opcode: 0000 [#1] PREEMPT SMP
[ 3040.584835] CPU: 29 PID: 11825 Comm: kworker/29:2 Not tainted 5.13.7debug+ #17
[ 3040.584841] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[ 3040.584846] Workqueue: bcachefs bch2_write_index
[ 3040.584855] RIP: 0010:__btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1)) 
[ 3040.584862] Code: 65 ff 0d ec 71 ba 7e 0f 85 37 ff ff ff e8 c0 64 b9 ff e9 2d ff ff ff 41 89 de e9 13 ff ff ff 41 be fc ff ff ff e9 1a ff ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 83 ec 10 48 89 2c 24
All code
========
   0:   65 ff 0d ec 71 ba 7e    decl   %gs:0x7eba71ec(%rip)        # 0x7eba71f3
   7:   0f 85 37 ff ff ff       jne    0xffffffffffffff44
   d:   e8 c0 64 b9 ff          call   0xffffffffffb964d2
  12:   e9 2d ff ff ff          jmp    0xffffffffffffff44
  17:   41 89 de                mov    %ebx,%r14d
  1a:   e9 13 ff ff ff          jmp    0xffffffffffffff32
  1f:   41 be fc ff ff ff       mov    $0xfffffffc,%r14d
  25:   e9 1a ff ff ff          jmp    0xffffffffffffff44
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  33:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  38:   48 83 ec 10             sub    $0x10,%rsp
  3c:   48 89 2c 24             mov    %rbp,(%rsp)

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2    
   2:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
   9:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
   e:   48 83 ec 10             sub    $0x10,%rsp
  12:   48 89 2c 24             mov    %rbp,(%rsp)
[ 3040.584870] RSP: 0018:ffff8886ba0d7a98 EFLAGS: 00010202
[ 3040.584875] RAX: ffff888016588000 RBX: 0000000000000000 RCX: 0000000000000002
[ 3040.584879] RDX: 00000000000001fb RSI: ffff888016588000 RDI: 0000000000000000
[ 3040.584882] RBP: ffff888467fe0000 R08: ffff888016588500 R09: 0000000000000001
[ 3040.584886] R10: 0000000000000001 R11: ffff888443728aca R12: ffffffffffffffff
[ 3040.584889] R13: ffff8886ba0d7d18 R14: ffff888016588488 R15: 0000000000000001
[ 3040.584893] FS:  0000000000000000(0000) GS:ffff88885ef40000(0000) knlGS:0000000000000000
[ 3040.584897] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3040.584901] CR2: 00007fbcf8ca9000 CR3: 0000000159810000 CR4: 00000000001506e0
[ 3040.584904] Call Trace:
[ 3040.584908] ? bch2_migrate_index_update (fs/bcachefs/move.c:89) 
[ 3040.584915] ? bch2_trans_unlink_iters (fs/bcachefs/btree_iter.c:2142 fs/bcachefs/btree_iter.c:2201 fs/bcachefs/btree_iter.c:2513) 
[ 3040.584920] ? __bch2_trans_commit (fs/bcachefs/btree_update_leaf.c:969) 
[ 3040.584926] bch2_trans_begin (fs/bcachefs/btree_iter.c:2560) 
[ 3040.584931] bch2_migrate_index_update (fs/bcachefs/move.c:89) 
[ 3040.584941] ? bch2_migrate_index_update (fs/bcachefs/keylist.h:51 fs/bcachefs/move.c:74) 
[ 3040.584948] __bch2_write_index (fs/bcachefs/io.c:610) 
[ 3040.584952] bch2_write_index (fs/bcachefs/io.c:641) 
[ 3040.584956] process_one_work (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2281) 
[ 3040.584963] worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2423) 
[ 3040.584968] ? rescuer_thread (kernel/workqueue.c:2365) 
[ 3040.584972] kthread (kernel/kthread.c:319) 
[ 3040.584977] ? set_kthread_struct (kernel/kthread.c:272) 
[ 3040.584982] ret_from_fork (arch/x86/entry/entry_64.S:300) 
[ 3040.584989] Modules linked in: btrfs blake2b_generic xfs i2c_dev f2fs amdgpu uas input_leds intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd kvm iwlmvm igb btusb btintel irqbypass bluetooth iwlwifi usb_storage drm_ttm_helper ptp sp5100_tco ghash_clmulni_intel aacraid ttm gigabyte_wmi rapl pps_core wmi_bmof hid_generic i2c_piix4 backlight gpu_sched tpm_crb tpm_tis tpm_tis_core pinctrl_amd efivarfs
[ 3040.585032] ---[ end trace eb7a330d893ae5a7 ]---
[ 3040.586433] ------------[ cut here ]------------
[ 3040.586436] WARNING: CPU: 29 PID: 11825 at kernel/rcu/tree_plugin.h:359 rcu_note_context_switch (kernel/rcu/tree_plugin.h:359) 
[ 3040.586445] Modules linked in: btrfs blake2b_generic xfs i2c_dev f2fs amdgpu uas input_leds intel_rapl_msr intel_rapl_common iosf_mbi kvm_amd kvm iwlmvm igb btusb btintel irqbypass bluetooth iwlwifi usb_storage drm_ttm_helper ptp sp5100_tco ghash_clmulni_intel aacraid ttm gigabyte_wmi rapl pps_core wmi_bmof hid_generic i2c_piix4 backlight gpu_sched tpm_crb tpm_tis tpm_tis_core pinctrl_amd efivarfs
[ 3040.590106] CPU: 29 PID: 11825 Comm: kworker/29:2 Tainted: G      D           5.13.7debug+ #17
[ 3040.590112] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
[ 3040.590116] Workqueue: bcachefs bch2_write_index
[ 3040.590120] RIP: 0010:rcu_note_context_switch (kernel/rcu/tree_plugin.h:359) 
[ 3040.590125] Code: 25 00 27 01 00 48 03 1c c5 40 a6 40 82 0f 1f 44 00 00 40 84 ed 75 15 65 48 8b 04 25 00 27 01 00 8b 90 cc 02 00 00 85 d2 7e 02 <0f> 0b 65 48 8b 04 25 00 27 01 00 8b 80 cc 02 00 00 85 c0 7e 0f 41
All code
========
   0:   25 00 27 01 00          and    $0x12700,%eax
   5:   48 03 1c c5 40 a6 40    add    -0x7dbf59c0(,%rax,8),%rbx
   c:   82 
   d:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  12:   40 84 ed                test   %bpl,%bpl
  15:   75 15                   jne    0x2c
  17:   65 48 8b 04 25 00 27    mov    %gs:0x12700,%rax
  1e:   01 00 
  20:   8b 90 cc 02 00 00       mov    0x2cc(%rax),%edx
  26:   85 d2                   test   %edx,%edx
  28:   7e 02                   jle    0x2c
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   65 48 8b 04 25 00 27    mov    %gs:0x12700,%rax
  33:   01 00 
  35:   8b 80 cc 02 00 00       mov    0x2cc(%rax),%eax
  3b:   85 c0                   test   %eax,%eax
  3d:   7e 0f                   jle    0x4e
  3f:   41                      rex.B

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2    
   2:   65 48 8b 04 25 00 27    mov    %gs:0x12700,%rax
   9:   01 00 
   b:   8b 80 cc 02 00 00       mov    0x2cc(%rax),%eax
  11:   85 c0                   test   %eax,%eax
  13:   7e 0f                   jle    0x24
  15:   41                      rex.B
[ 3040.590132] RSP: 0018:ffff8886ba0d7548 EFLAGS: 00010002
[ 3040.590136] RAX: ffff88811bea2a00 RBX: ffff88885ef64680 RCX: 0000000000000000
[ 3040.590139] RDX: 0000000000000001 RSI: ffffffff8235db4a RDI: ffffffff8235df7f
[ 3040.590142] RBP: 0000000000000000 R08: ffff88885ef63a80 R09: 0000000000000000
[ 3040.590146] R10: 0000000000000009 R11: 0000000000000010 R12: ffff88811bea2a00
[ 3040.590149] R13: ffff88811bea2a00 R14: 0000000000000007 R15: ffff88885ef63a80
[ 3040.590153] FS:  0000000000000000(0000) GS:ffff88885ef40000(0000) knlGS:0000000000000000
[ 3040.590157] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3040.590160] CR2: 00007fbcf8ca9000 CR3: 0000000159810000 CR4: 00000000001506e0
[ 3040.590164] Call Trace:
[ 3040.590167] __schedule (kernel/sched/sched.h:1334 kernel/sched/core.c:5065) 
[ 3040.590173] schedule (kernel/sched/core.c:5231 (discriminator 1)) 
[ 3040.590176] schedule_timeout (kernel/time/timer.c:1870) 
[ 3040.590181] ? preempt_count_add (./include/linux/ftrace.h:844 kernel/sched/core.c:4784 kernel/sched/core.c:4781 kernel/sched/core.c:4809) 
[ 3040.590185] wait_for_completion (kernel/sched/completion.c:86 kernel/sched/completion.c:106 kernel/sched/completion.c:117 kernel/sched/completion.c:138) 
[ 3040.590190] virt_efi_query_variable_info (drivers/firmware/efi/runtime-wrappers.c:374 (discriminator 7)) 
[ 3040.590197] efi_query_variable_store.part.0 (arch/x86/platform/efi/quirks.c:165) 
[ 3040.590203] ? _raw_spin_lock_irqsave (./arch/x86/include/asm/atomic.h:202 ./include/asm-generic/atomic-instrumented.h:707 ./include/asm-generic/qspinlock.h:82 ./include/linux/spinlock.h:195 ./include/linux/spinlock_api_smp.h:119 kernel/locking/spinlock.c:159) 
[ 3040.590208] efivar_entry_set_safe (drivers/firmware/efi/vars.c:776) 
[ 3040.590213] efi_pstore_write (drivers/firmware/efi/efi-pstore.c:271 (discriminator 6)) 
[ 3040.590220] pstore_dump (fs/pstore/platform.c:462) 
[ 3040.590227] kmsg_dump (kernel/printk/printk.c:3345 (discriminator 10)) 
[ 3040.590232] oops_end (arch/x86/kernel/dumpstack.c:372) 
[ 3040.590237] do_trap (arch/x86/kernel/traps.c:120 arch/x86/kernel/traps.c:161) 
[ 3040.590241] ? __btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1)) 
[ 3040.590246] do_error_trap (arch/x86/kernel/traps.c:84 arch/x86/kernel/traps.c:182) 
[ 3040.590249] ? __btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1)) 
[ 3040.590254] exc_invalid_op (arch/x86/kernel/traps.c:267) 
[ 3040.590259] ? __btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1)) 
[ 3040.590264] asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:566) 
[ 3040.590268] RIP: 0010:__btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1)) 
[ 3040.590273] Code: 65 ff 0d ec 71 ba 7e 0f 85 37 ff ff ff e8 c0 64 b9 ff e9 2d ff ff ff 41 89 de e9 13 ff ff ff 41 be fc ff ff ff e9 1a ff ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 83 ec 10 48 89 2c 24
All code
========
   0:   65 ff 0d ec 71 ba 7e    decl   %gs:0x7eba71ec(%rip)        # 0x7eba71f3
   7:   0f 85 37 ff ff ff       jne    0xffffffffffffff44
   d:   e8 c0 64 b9 ff          call   0xffffffffffb964d2
  12:   e9 2d ff ff ff          jmp    0xffffffffffffff44
  17:   41 89 de                mov    %ebx,%r14d
  1a:   e9 13 ff ff ff          jmp    0xffffffffffffff32
  1f:   41 be fc ff ff ff       mov    $0xfffffffc,%r14d
  25:   e9 1a ff ff ff          jmp    0xffffffffffffff44
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  33:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  38:   48 83 ec 10             sub    $0x10,%rsp
  3c:   48 89 2c 24             mov    %rbp,(%rsp)

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2    
   2:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
   9:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
   e:   48 83 ec 10             sub    $0x10,%rsp
  12:   48 89 2c 24             mov    %rbp,(%rsp)
[ 3040.590280] RSP: 0018:ffff8886ba0d7a98 EFLAGS: 00010202
[ 3040.590284] RAX: ffff888016588000 RBX: 0000000000000000 RCX: 0000000000000002
[ 3040.590287] RDX: 00000000000001fb RSI: ffff888016588000 RDI: 0000000000000000
[ 3040.590290] RBP: ffff888467fe0000 R08: ffff888016588500 R09: 0000000000000001
[ 3040.590293] R10: 0000000000000001 R11: ffff888443728aca R12: ffffffffffffffff
[ 3040.590296] R13: ffff8886ba0d7d18 R14: ffff888016588488 R15: 0000000000000001
[ 3040.590301] ? __btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1325) 
[ 3040.590306] ? bch2_migrate_index_update (fs/bcachefs/move.c:89) 
[ 3040.590311] ? bch2_trans_unlink_iters (fs/bcachefs/btree_iter.c:2142 fs/bcachefs/btree_iter.c:2201 fs/bcachefs/btree_iter.c:2513) 
[ 3040.590315] ? __bch2_trans_commit (fs/bcachefs/btree_update_leaf.c:969) 
[ 3040.590319] bch2_trans_begin (fs/bcachefs/btree_iter.c:2560) 
[ 3040.590324] bch2_migrate_index_update (fs/bcachefs/move.c:89) 
[ 3040.590333] ? bch2_migrate_index_update (fs/bcachefs/keylist.h:51 fs/bcachefs/move.c:74) 
[ 3040.590340] __bch2_write_index (fs/bcachefs/io.c:610) 
[ 3040.590344] bch2_write_index (fs/bcachefs/io.c:641) 
[ 3040.590348] process_one_work (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2281) 
[ 3040.590353] worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2423) 
[ 3040.590357] ? rescuer_thread (kernel/workqueue.c:2365) 
[ 3040.590362] kthread (kernel/kthread.c:319) 
[ 3040.590366] ? set_kthread_struct (kernel/kthread.c:272) 
[ 3040.590370] ret_from_fork (arch/x86/entry/entry_64.S:300) 
[ 3040.590375] ---[ end trace eb7a330d893ae5a8 ]---
[ 3041.950494] RIP: 0010:__btree_iter_traverse_all (fs/bcachefs/btree_iter.c:1332 (discriminator 1)) 
[ 3041.950509] Code: 65 ff 0d ec 71 ba 7e 0f 85 37 ff ff ff e8 c0 64 b9 ff e9 2d ff ff ff 41 89 de e9 13 ff ff ff 41 be fc ff ff ff e9 1a ff ff ff <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 48 83 ec 10 48 89 2c 24
derlaft commented 3 years ago

Not sure if that's the same problem: kernel BUG at fs/bcachefs/btree_iter.c:678!

Full dmesg: https://gist.github.com/derlaft/07c7a6a2ad9bfa19c049d4cea2088cb5 fline2addr's included (I hope that's what was needed)

Basically how I got there:

It mounts with ro,nochanges,norecovery; reboot does not help anymore

jpsollie commented 3 years ago

fixed by commit fd3fdd74af3d682b780eae5b087062f91780f8ed