koverstreet / bcachefs

Other
694 stars 72 forks source link

removed device readded during journal replay - cannot mount [e2cd10f254d] #615

Closed Dikay900 closed 11 months ago

Dikay900 commented 11 months ago

Version

bcachefs master e2cd10f254d

Generic info

bcachefs show-super /dev/sdb
External UUID:                              f0605888-9e17-4fec-bef6-92e885adc371
Internal UUID:                              3edf84f6-82fa-4eeb-9376-47005224497e
Device index:                               1
Label:                                      
Version:                                    1.3: rebalance_work
Version upgrade complete:                   1.3: rebalance_work
Oldest version on disk:                     1.1: snapshot_skiplists
Created:                                    Thu Aug 10 09:03:52 2023
Sequence number:                            1213
Superblock size:                            13424
Clean:                                      0
Devices:                                    8
Sections:                                   members_v1,replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors
Features:                                   zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                            alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                               4.00 KiB
  btree_node_size:                          256 KiB
  errors:                                   continue [ro] panic 
  metadata_replicas:                        2
  data_replicas:                            2
  metadata_replicas_required:               1
  data_replicas_required:                   1
  encoded_extent_max:                       64.0 KiB
  metadata_checksum:                        none [crc32c] crc64 xxhash 
  data_checksum:                            none [crc32c] crc64 xxhash 
  compression:                              none
  background_compression:                   none
  str_hash:                                 crc32c crc64 [siphash] 
  metadata_target:                          fast
  foreground_target:                        fast
  background_target:                        hdd
  promote_target:                           fast
  erasure_code:                             0
  inodes_32bit:                             1
  shard_inode_numbers:                      1
  inodes_use_key_cache:                     1
  gc_reserve_percent:                       8
  gc_reserve_bytes:                         0 B
  root_reserve_percent:                     0
  wide_macs:                                0
  acl:                                      1
  usrquota:                                 0
  grpquota:                                 0
  prjquota:                                 0
  journal_flush_delay:                      1000
  journal_flush_disabled:                   0
  journal_reclaim_delay:                    100
  journal_transaction_names:                1
  version_upgrade:                          [compatible] incompatible none 
  nocow:                                    0

replicas_v0 (size 192):
  user: 2 [2 11] user: 2 [0 2] cached: 1 [0] user: 1 [1] user: 2 [1 4] user: 2 [4 11] cached: 1 [4] btree: 2 [0 1] user: 1 [8] user: 2 [0 9] user: 2 [2 3] user: 2 [3 11] user: 2 [8 11] cached: 1 [2] cached: 1 [9] journal: 2 [4 9] btree: 2 [0 3] user: 1 [3] user: 1 [11] user: 2 [0 4] user: 2 [1 2] user: 2 [1 9] user: 2 [2 8] user: 2 [3 8] user: 2 [4 9] user: 2 [8 9] user: 2 [9 11] cached: 1 [1] cached: 1 [3] cached: 1 [8] cached: 1 [11] journal: 2 [0 2] journal: 2 [8 9] btree: 2 [0 2] user: 1 [0] user: 1 [2] user: 1 [4] user: 1 [9] user: 2 [0 1] user: 2 [0 3] user: 2 [0 8] user: 2 [0 11] user: 2 [1 3] user: 2 [1 8] user: 2 [1 11] user: 2 [2 4] user: 2 [2 9] user: 2 [3 4] user: 2 [3 9] user: 2 [4 8]

members_v2 (size 1576):
  Device:                                   0
    Label:                                  fast1 (20)
    UUID:                                   5b51f194-c644-494a-abb9-1af84e838eed
    Size:                                   932 GiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                1907739
    Last mount:                             Fri Nov 24 12:09:18 2023
    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,cached
    Discard:                                1
    Freespace initialized:                  1
  Device:                                   1
    Label:                                  hdd2 (2)
    UUID:                                   c9b36005-d887-4967-a98f-bf0a74303f88
    Size:                                   3.64 TiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                7630895
    Last mount:                             Fri Nov 24 12:09:18 2023
    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               btree,user,cached
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   2
    Label:                                  fast2 (21)
    UUID:                                   e94e00bb-9e5c-43ef-bd96-161c559c0c85
    Size:                                   466 GiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                953880
    Last mount:                             Fri Nov 24 12:09:18 2023
    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,cached
    Discard:                                1
    Freespace initialized:                  1
  Device:                                   3
    Label:                                  hdd4 (4)
    UUID:                                   7da308d8-deaf-4f17-babf-5c8fa4dca33d
    Size:                                   3.64 TiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                7630895
    Last mount:                             Fri Nov 24 12:09:18 2023
    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               btree,user,cached
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   4
    Label:                                  hdd5 (5)
    UUID:                                   c1f38d5c-d588-49d8-9713-7234d09c2be7
    Size:                                   7.28 TiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            1.00 MiB
    First bucket:                           0
    Buckets:                                7630855
    Last mount:                             Fri Nov 24 12:09:18 2023
    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,user,cached
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   8
    Label:                                  hdd6 (6)
    UUID:                                   4b2ef818-7a27-421d-a99f-7192139e7a70
    Size:                                   5.46 TiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            1.00 MiB
    First bucket:                           0
    Buckets:                                5723136
    Last mount:                             Fri Nov 24 12:09:18 2023
    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,user,cached
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   9
    Label:                                  hdd7 (7)
    UUID:                                   ef6b86f4-d94e-4b34-8eb8-4c1ad2ce5edb
    Size:                                   7.28 TiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            1.00 MiB
    First bucket:                           0
    Buckets:                                7630855
    Last mount:                             Fri Nov 24 12:09:18 2023
    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,user,cached
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   11
    Label:                                  ssd3 (11)
    UUID:                                   53bb3640-5f0f-4d24-9c4a-77e4963e17b9
    Size:                                   932 GiB
    read errors:                            298
    write errors:                           3215
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                1907739
    Last mount:                             Fri Nov 24 12:09:18 2023
    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               user,cached
    Discard:                                0
    Freespace initialized:                  1

I am currently tearing down my old array and moving the data to a new one, removing some devices from the existing array. After the latest removal I cannot mount the array anymore. Trying to fsck results into a crash see below: Are there more options available i can get into read only mode of this array without replaying the journal or dropping the journal keys? Since I am reading only from this array anyway I do not need write access anymore.

[Fri Nov 24 11:56:18 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): mounting version 1.3: rebalance_work opts=errors=continue,metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,nochanges
[Fri Nov 24 11:56:18 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): recovering from unclean shutdown
[Fri Nov 24 11:58:20 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal read done, replaying entries 25734944-25735261
[Fri Nov 24 11:58:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, exiting
[Fri Nov 24 11:58:20 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Unable to continue, halting
[Fri Nov 24 11:58:20 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): bch2_fs_recovery(): error fsck_errors_not_fixed
[Fri Nov 24 11:58:20 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): bch2_fs_start(): error starting filesystem fsck_errors_not_fixed
[Fri Nov 24 11:58:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): mounting version 1.3: rebalance_work opts=errors=continue,metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,nochanges,norecovery
[Fri Nov 24 11:58:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): recovering from unclean shutdown
[Fri Nov 24 12:00:53 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal read done, replaying entries 25734944-25735261
[Fri Nov 24 12:00:53 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, exiting
[Fri Nov 24 12:00:53 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Unable to continue, halting
[Fri Nov 24 12:00:53 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): bch2_fs_recovery(): error fsck_errors_not_fixed
[Fri Nov 24 12:00:53 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): bch2_fs_start(): error starting filesystem fsck_errors_not_fixed
[Fri Nov 24 12:03:15 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,nochanges,norecovery,reconstruct_alloc
[Fri Nov 24 12:03:15 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): recovering from unclean shutdown
[Fri Nov 24 12:05:17 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal read done, replaying entries 25734944-25735261
[Fri Nov 24 12:05:17 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, exiting
[Fri Nov 24 12:05:17 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Unable to continue, halting
[Fri Nov 24 12:05:17 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): bch2_fs_recovery(): error fsck_errors_not_fixed
[Fri Nov 24 12:05:17 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): bch2_fs_start(): error starting filesystem fsck_errors_not_fixed
[Fri Nov 24 12:06:04 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): mounting version 1.3: rebalance_work opts=ro,metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,fsck,nochanges,norecovery,reconstruct_alloc
[Fri Nov 24 12:06:04 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): recovering from unclean shutdown
[Fri Nov 24 12:06:04 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): cannot select both norecovery and fsck
[Fri Nov 24 12:06:04 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): bch2_fs_recovery(): error EINVAL
[Fri Nov 24 12:06:04 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): bch2_fs_start(): error starting filesystem EINVAL
[Fri Nov 24 12:06:48 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): mounting version 1.3: rebalance_work opts=ro,errors=continue,metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,nochanges,norecovery,reconstruct_alloc
[Fri Nov 24 12:06:48 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): recovering from unclean shutdown
[Fri Nov 24 12:08:50 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal read done, replaying entries 25734944-25735261
[Fri Nov 24 12:08:50 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, exiting
[Fri Nov 24 12:08:50 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Unable to continue, halting
[Fri Nov 24 12:08:50 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): bch2_fs_recovery(): error fsck_errors_not_fixed
[Fri Nov 24 12:08:50 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): bch2_fs_start(): error starting filesystem fsck_errors_not_fixed
[Fri Nov 24 12:09:18 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,fsck,fix_errors=yes,reconstruct_alloc
[Fri Nov 24 12:09:18 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): recovering from unclean shutdown
[Fri Nov 24 12:11:20 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal read done, replaying entries 25734944-25735261
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, fixing
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734945 offset=923/1065: bad dev, fixing
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734946 offset=1216/1358: bad dev, fixing
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734947 offset=956/1098: bad dev, fixing
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734948 offset=714/856: bad dev, fixing
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734949 offset=808/950: bad dev, fixing
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734950 offset=783/925: bad dev, fixing
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734951 offset=1229/1371: bad dev, fixing
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734952 offset=1165/1307: bad dev, fixing
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734953 offset=712/854: bad dev, fixing
[Fri Nov 24 12:11:20 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734954 offset=803/945: bad dev, fixing
[Fri Nov 24 12:11:20 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Ratelimiting new instances of previous error
[Fri Nov 24 12:11:20 2023] BUG: kernel NULL pointer dereference, address: 00000000000000be
[Fri Nov 24 12:11:20 2023] #PF: supervisor read access in kernel mode
[Fri Nov 24 12:11:20 2023] #PF: error_code(0x0000) - not-present page
[Fri Nov 24 12:11:20 2023] PGD 0 P4D 0 
[Fri Nov 24 12:11:20 2023] Oops: 0000 [#1] PREEMPT SMP NOPTI
[Fri Nov 24 12:11:20 2023] CPU: 1 PID: 2812 Comm: bcachefs Not tainted 6.6.0+ #5
[Fri Nov 24 12:11:20 2023] Hardware name: System manufacturer System Product Name/PRIME B450M-A, BIOS 2409 12/02/2020
[Fri Nov 24 12:11:20 2023] RIP: 0010:bch2_have_enough_devs+0xdb/0x2a0
[Fri Nov 24 12:11:20 2023] Code: 74 7c 31 ff 31 ed 31 d2 89 d0 0f b6 4c 06 03 48 89 c8 48 83 c1 44 49 8b 4c cc 08 48 0f a3 04 24 0f 92 c0 0f b6 c0 01 c5 31 c0 <80> b9 be 00 00 00 02 0f 94 c0 83 c2 01 01 c7 0f b6 46 01 39 c2 72
[Fri Nov 24 12:11:20 2023] RSP: 0018:ffffa663c38bb728 EFLAGS: 00010246
[Fri Nov 24 12:11:20 2023] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[Fri Nov 24 12:11:20 2023] RDX: 0000000000000001 RSI: ffff9903cbf2644b RDI: 0000000000000000
[Fri Nov 24 12:11:20 2023] RBP: 0000000000000001 R08: 0000000000000002 R09: 00000000fffffff3
[Fri Nov 24 12:11:20 2023] R10: 0000000000000001 R11: ffff9903cdc64000 R12: ffff990445680000
[Fri Nov 24 12:11:20 2023] R13: 0000000000000008 R14: 000000000000000c R15: 0000000000000004
[Fri Nov 24 12:11:20 2023] FS:  00007f0832d41c80(0000) GS:ffff990ebea40000(0000) knlGS:0000000000000000
[Fri Nov 24 12:11:20 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Nov 24 12:11:20 2023] CR2: 00000000000000be CR3: 00000001859d6000 CR4: 00000000003506e0
[Fri Nov 24 12:11:20 2023] Call Trace:
[Fri Nov 24 12:11:20 2023]  <TASK>
[Fri Nov 24 12:11:20 2023]  ? __die+0x23/0x70
[Fri Nov 24 12:11:20 2023]  ? page_fault_oops+0x17d/0x4c0
[Fri Nov 24 12:11:20 2023]  ? __cgroup_account_cputime+0x50/0x80
[Fri Nov 24 12:11:20 2023]  ? exc_page_fault+0x77/0x170
[Fri Nov 24 12:11:20 2023]  ? asm_exc_page_fault+0x26/0x30
[Fri Nov 24 12:11:20 2023]  ? bch2_have_enough_devs+0xdb/0x2a0
[Fri Nov 24 12:11:20 2023]  ? bch2_have_enough_devs+0x60/0x2a0
[Fri Nov 24 12:11:20 2023]  ? schedule+0x3f/0xe0
[Fri Nov 24 12:11:20 2023]  ? __closure_sync+0x82/0x160
[Fri Nov 24 12:11:20 2023]  bch2_write_super+0x49a/0xd40
[Fri Nov 24 12:11:20 2023]  bch2_journal_seq_blacklist_add+0xb7/0x170
[Fri Nov 24 12:11:20 2023]  bch2_fs_recovery+0xa9e/0x1560
[Fri Nov 24 12:11:20 2023]  ? bch2_printbuf_exit+0x20/0x30
[Fri Nov 24 12:11:20 2023]  ? print_mount_opts+0x2b2/0x3e0
[Fri Nov 24 12:11:20 2023]  bch2_fs_start+0x32f/0x3b0
[Fri Nov 24 12:11:20 2023]  bch2_fs_open+0xfab/0x14f0
[Fri Nov 24 12:11:20 2023]  ? __kmem_cache_alloc_node+0x176/0x2e0
[Fri Nov 24 12:11:20 2023]  ? bch2_mount+0x4da/0x6f0
[Fri Nov 24 12:11:20 2023]  bch2_mount+0x4da/0x6f0
[Fri Nov 24 12:11:20 2023]  legacy_get_tree+0x2b/0x50
[Fri Nov 24 12:11:20 2023]  vfs_get_tree+0x29/0xf0
[Fri Nov 24 12:11:20 2023]  path_mount+0x483/0xaa0
[Fri Nov 24 12:11:20 2023]  __x64_sys_mount+0x11a/0x150
[Fri Nov 24 12:11:20 2023]  do_syscall_64+0x5f/0xc0
[Fri Nov 24 12:11:20 2023]  ? __count_memcg_events+0x73/0xc0
[Fri Nov 24 12:11:20 2023]  ? count_memcg_events.constprop.0+0x1a/0x30
[Fri Nov 24 12:11:20 2023]  ? handle_mm_fault+0xa2/0x370
[Fri Nov 24 12:11:20 2023]  ? preempt_count_add+0x4b/0xa0
[Fri Nov 24 12:11:20 2023]  ? up_read+0x3b/0x80
[Fri Nov 24 12:11:20 2023]  ? do_user_addr_fault+0x31c/0x650
[Fri Nov 24 12:11:20 2023]  ? fpregs_assert_state_consistent+0x26/0x50
[Fri Nov 24 12:11:20 2023]  ? exit_to_user_mode_prepare+0x40/0x1d0
[Fri Nov 24 12:11:20 2023]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Fri Nov 24 12:11:20 2023] RIP: 0033:0x7f0832e54b7a
[Fri Nov 24 12:11:20 2023] Code: 48 8b 0d 89 82 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 82 0c 00 f7 d8 64 89 01 48
[Fri Nov 24 12:11:20 2023] RSP: 002b:00007ffc2df72da8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[Fri Nov 24 12:11:20 2023] RAX: ffffffffffffffda RBX: 0000558c5d91be40 RCX: 00007f0832e54b7a
[Fri Nov 24 12:11:20 2023] RDX: 0000558c5d91ad40 RSI: 0000558c5d955720 RDI: 0000558c5d91be40
[Fri Nov 24 12:11:20 2023] RBP: 0000558c5d91ad40 R08: 0000558c5d917470 R09: 8080808080808080
[Fri Nov 24 12:11:20 2023] R10: 0000000000000000 R11: 0000000000000246 R12: 0000558c5d917470
[Fri Nov 24 12:11:20 2023] R13: 0000000000000000 R14: 0000558c5d9231f0 R15: 0000000000000008
[Fri Nov 24 12:11:20 2023]  </TASK>
[Fri Nov 24 12:11:20 2023] Modules linked in: softdog xt_conntrack xt_CHECKSUM nft_chain_nat nf_conntrack_netlink xt_MASQUERADE xfrm_user nf_nat xfrm_algo nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp xt_addrtype nft_compat br_netfilter bridge stp llc nf_tables nfnetlink sunrpc overlay zstd zram zsmalloc binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm irqbypass evdev ghash_clmulni_intel sha512_ssse3 sha512_generic eeepc_wmi snd_pcm asus_wmi aesni_intel battery crypto_simd snd_timer cryptd ledtrig_audio sparse_keymap platform_profile rapl snd soundcore acpi_cpufreq video pcspkr sp5100_tco wmi_bmof k10temp watchdog ccp rng_core cfg80211 sg button rfkill bcache drm dm_mod loop efi_pstore fuse configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 ses enclosure scsi_transport_sas btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod sd_mod t10_pi uas crc64_rocksoft usb_storage crc_t10dif
[Fri Nov 24 12:11:20 2023]  crct10dif_generic r8169 ahci libahci xhci_pci realtek libata crct10dif_pclmul crct10dif_common crc32_pclmul mdio_devres xhci_hcd crc32c_intel atlantic libphy i2c_piix4 usbcore scsi_mod macsec scsi_common usb_common wmi gpio_amdpt gpio_generic
[Fri Nov 24 12:11:20 2023] CR2: 00000000000000be
[Fri Nov 24 12:11:20 2023] ---[ end trace 0000000000000000 ]---
[Fri Nov 24 12:11:20 2023] RIP: 0010:bch2_have_enough_devs+0xdb/0x2a0
[Fri Nov 24 12:11:20 2023] Code: 74 7c 31 ff 31 ed 31 d2 89 d0 0f b6 4c 06 03 48 89 c8 48 83 c1 44 49 8b 4c cc 08 48 0f a3 04 24 0f 92 c0 0f b6 c0 01 c5 31 c0 <80> b9 be 00 00 00 02 0f 94 c0 83 c2 01 01 c7 0f b6 46 01 39 c2 72
[Fri Nov 24 12:11:20 2023] RSP: 0018:ffffa663c38bb728 EFLAGS: 00010246
[Fri Nov 24 12:11:20 2023] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[Fri Nov 24 12:11:20 2023] RDX: 0000000000000001 RSI: ffff9903cbf2644b RDI: 0000000000000000
[Fri Nov 24 12:11:20 2023] RBP: 0000000000000001 R08: 0000000000000002 R09: 00000000fffffff3
[Fri Nov 24 12:11:20 2023] R10: 0000000000000001 R11: ffff9903cdc64000 R12: ffff990445680000
[Fri Nov 24 12:11:20 2023] R13: 0000000000000008 R14: 000000000000000c R15: 0000000000000004
[Fri Nov 24 12:11:20 2023] FS:  00007f0832d41c80(0000) GS:ffff990ebea40000(0000) knlGS:0000000000000000
[Fri Nov 24 12:11:20 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri Nov 24 12:11:20 2023] CR2: 00000000000000be CR3: 00000001859d6000 CR4: 00000000003506e0
[Fri Nov 24 12:11:20 2023] note: bcachefs[2812] exited with irqs disabled

fsck with debug usertools gives me

bcachefs: libbcachefs/sb-members.h:160: bch_dev_bkey_exists: Assertion '!(idx >= c->sb.nr_devices || !c->devs[idx])' failed.

static inline struct bch_dev *bch_dev_bkey_exists(const struct bch_fs *c, unsigned idx)
{
        EBUG_ON(idx >= c->sb.nr_devices || !c->devs[idx]);

        return rcu_dereference_check(c->devs[idx], 1);
}
Dikay900 commented 11 months ago

It seems i have data on non existing devices now based on the metadata.

available devices are currently ids: 0 1 2 3 4 8 9 11

It seems as device 5 is getting checked although I do not see any data on this id.

replicas_v0:
user: 2 [2 11] user: 2 [0 2] cached: 1 [0] user: 1 [1] user: 2 [1 4] user: 2 [4 11] cached: 1 [4] btree: 2 [0 1] user: 1 [8] user: 2 [0 9] user: 2 [2 3] user: 2 [3 11] user: 2 [8 11] cached: 1 [2] cached: 1 [9] journal: 2 [4 9] btree: 2 [0 3] user: 1 [3] user: 1 [11] user: 2 [0 4] user: 2 [1 2] user: 2 [1 9] user: 2 [2 8] user: 2 [3 8] user: 2 [4 9] user: 2 [8 9] user: 2 [9 11] cached: 1 [1] cached: 1 [3] cached: 1 [8] cached: 1 [11] journal: 2 [0 2] journal: 2 [8 9] btree: 2 [0 2] user: 1 [0] user: 1 [2] user: 1 [4] user: 1 [9] user: 2 [0 1] user: 2 [0 3] user: 2 [0 8] user: 2 [0 11] user: 2 [1 3] user: 2 [1 8] user: 2 [1 11] user: 2 [2 4] user: 2 [2 9] user: 2 [3 4] user: 2 [3 9] user: 2 [4 8]

With the following trace output added in bch2_have_enough_devs: replicas.c:932

bool bch2_have_enough_devs(struct bch_fs *c, struct bch_devs_mask devs,
                           unsigned flags, bool print)
{
        struct bch_replicas_entry *e;
        bool ret = true;

        percpu_down_read(&c->mark_lock);
        for_each_cpu_replicas_entry(&c->replicas, e) {
                unsigned i, nr_online = 0, nr_failed = 0, dflags = 0;
                bool metadata = e->data_type < BCH_DATA_user;

                if (e->data_type == BCH_DATA_cached)
                        continue;

                for (i = 0; i < e->nr_devs; i++) {
+                        bch_err(c, "checking dev index (%u) of replicas (%u) with type (%u)",
+                                        e->devs[i], e->nr_devs, e->data_type);
                        struct bch_dev *ca = bch_dev_bkey_exists(c, e->devs[i]);

                        nr_online += test_bit(e->devs[i], devs.d);
                        nr_failed += ca->mi.state == BCH_MEMBER_STATE_failed;
                }

I am getting the following output:

checking dev index (2) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (11) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (3)
checking dev index (3) of replicas (2) with type (3)
checking dev index (1) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (2)
checking dev index (9) of replicas (2) with type (2)
checking dev index (3) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (2)
checking dev index (5) of replicas (2) with type (2)
bcachefs: libbcachefs/sb-members.h:160: bch_dev_bkey_exists: Assertion `!(idx >= c->sb.nr_devices || !c->devs[idx])' failed.

looking into bch2_dev_remove I see the following comment, which could actually be the case here

        /*
         * At this point the device object has been removed in-core, but the
         * on-disk journal might still refer to the device index via sb device
         * usage entries. Recovery fails if it sees usage information for an
         * invalid device. Flush journal pins to push the back of the journal
         * past now invalid device index references before we update the
         * superblock, but after the device object has been removed so any
         * further journal writes elide usage info for the device.
         */
        bch2_journal_flush_all_pins(&c->journal);

actually I see that the device 5 was added during the journal replay. Since the checking dev index trace is printed at the beginning of the fsck where no dev 5 is there:

~/bcachefs-tools# bcachefs fsck -pv /dev/sdl /dev/sda /dev/sdd /dev/sdb /dev/sdg /dev/sdh /dev/sdj /dev/sdf
checking dev index (2) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (1) of replicas (1) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (3)
checking dev index (1) of replicas (2) with type (3)
checking dev index (8) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (2)
checking dev index (9) of replicas (2) with type (2)
checking dev index (0) of replicas (2) with type (3)
checking dev index (3) of replicas (2) with type (3)
checking dev index (3) of replicas (1) with type (4)
checking dev index (11) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (2)
checking dev index (2) of replicas (2) with type (2)
checking dev index (8) of replicas (2) with type (2)
checking dev index (9) of replicas (2) with type (2)
checking dev index (0) of replicas (2) with type (3)
checking dev index (2) of replicas (2) with type (3)
checking dev index (0) of replicas (1) with type (4)
checking dev index (2) of replicas (1) with type (4)
checking dev index (4) of replicas (1) with type (4)
checking dev index (9) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,degraded,fsck,fix_errors=yes
recovering from unclean shutdown
starting journal read
journal read done on device 0x55fc71bd15c0g, ret 0
journal read done on device 0x55fc71bca5e0g, ret 0
journal read done on device 0x55fc71bc27f0g, ret 0
journal read done on device 0x55fc71bc3e60g, ret 0
journal read done on device 0x55fc71bd1770g, ret 0
journal read done on device 0x55fc71bdd400g, ret 0
journal read done on device 0x55fc71bca770g, ret 0
journal read done on device 0x55fc71bdd770g, ret 0
journal read done, replaying entries 25734944-25735261
invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, fixing
[..]
invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25735261 offset=67279/67421: bad dev, fixing
Journal keys: 27066 read, 18675 after sorting and compacting
checking dev index (2) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (11) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (3)
checking dev index (3) of replicas (2) with type (3)
checking dev index (1) of replicas (2) with type (4)
checking dev index (2) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (2)
checking dev index (9) of replicas (2) with type (2)
checking dev index (3) of replicas (1) with type (4)
checking dev index (0) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (4)
checking dev index (1) of replicas (2) with type (4)
checking dev index (9) of replicas (2) with type (4)
checking dev index (3) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (8) of replicas (2) with type (4)
checking dev index (11) of replicas (2) with type (4)
checking dev index (4) of replicas (2) with type (2)
checking dev index (5) of replicas (2) with type (2)
bcachefs: libbcachefs/sb-members.h:160: bch_dev_bkey_exists: Assertion `!(idx >= c->sb.nr_devices || !c->devs[idx])' failed.

data type 2 is journal so in the beginning there was no journal entry for dev 5 but after the replay there is. Probably the scenario described in the comment above during dev remove.

So now the question would be how can i just ignore the readded device or properly remove it after journal replay or how to skip the journal replay here.

koverstreet commented 11 months ago

Can you pipe that through scripts/decode_stacktrace.sh?

Dikay900 commented 11 months ago

Ah forgot the stacktrace yesterday while trying to find a way, thats from the userspace tool but the same crash:

(gdb) run fsck -pv /dev/sdl /dev/sda /dev/sdd /dev/sdb /dev/sdg /dev/sdh /dev/sdj /dev/sdf
Starting program: /root/bcachefs-tools/bcachefs fsck -pv /dev/sdl /dev/sda /dev/sdd /dev/sdb /dev/sdg /dev/sdh /dev/sdj /dev/sdf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7aa4680 (LWP 56759)]
[New Thread 0x7ffff7a9b680 (LWP 56760)]
[New Thread 0x7ffff7a92680 (LWP 56761)]
[New Thread 0x7ffff7a89680 (LWP 56762)]
[New Thread 0x7ffff7a80680 (LWP 56763)]
[New Thread 0x7ffff7a77680 (LWP 56764)]
[New Thread 0x7ffff7a6e680 (LWP 56765)]
[New Thread 0x7ffff7a65680 (LWP 56766)]
[New Thread 0x7ffff79da680 (LWP 56768)]
[New Thread 0x7ffff79d1680 (LWP 56769)]
[New Thread 0x7ffff79c8680 (LWP 56770)]
[New Thread 0x7ffff79bf680 (LWP 56771)]
[New Thread 0x7ffff79b6680 (LWP 56772)]
[New Thread 0x7ffff6f0f680 (LWP 56773)]
mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,degraded,fsck,fix_errors=yes
recovering from unclean shutdown
starting journal read
journal read done on device 0x555555b145e0g, ret 0
journal read done on device 0x555555b0d5c0g, ret 0
journal read done on device 0x555555b057f0g, ret 0
journal read done on device 0x555555b06e60g, ret 0
journal read done on device 0x555555b14e60g, ret 0
journal read done on device 0x555555b205c0g, ret 0
journal read done on device 0x555555b1b720g, ret 0
journal read done on device 0x555555b14f20g, ret 0
journal read done, replaying entries 25734944-25735261
invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, fixing
[...]
invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25735261 offset=67279/67421: bad dev, fixing
Journal keys: 27066 read, 18675 after sorting and compacting
bcachefs: libbcachefs/sb-members.h:160: bch_dev_bkey_exists: Assertion `!(idx >= c->sb.nr_devices || !c->devs[idx])' failed.

Thread 1 "bcachefs" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) backtrace
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007ffff7b39d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007ffff7aeaf32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7ad5472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007ffff7ad5395 in __assert_fail_base (fmt=0x7ffff7c49a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5555558988b8 "!(idx >= c->sb.nr_devices || !c->devs[idx])", 
    file=file@entry=0x55555589869f "libbcachefs/sb-members.h", line=line@entry=160, function=function@entry=0x5555558bb2f0 <__PRETTY_FUNCTION__.0> "bch_dev_bkey_exists") at ./assert/assert.c:92
#5  0x00007ffff7ae3e32 in __GI___assert_fail (assertion=assertion@entry=0x5555558988b8 "!(idx >= c->sb.nr_devices || !c->devs[idx])", file=file@entry=0x55555589869f "libbcachefs/sb-members.h", line=line@entry=160, 
    function=function@entry=0x5555558bb2f0 <__PRETTY_FUNCTION__.0> "bch_dev_bkey_exists") at ./assert/assert.c:101
#6  0x00005555556749e5 in bch_dev_bkey_exists (idx=<optimized out>, c=0x7ffff79dc000) at libbcachefs/sb-members.h:160
#7  bch2_have_enough_devs (c=c@entry=0x7ffff79dc000, devs=..., flags=flags@entry=12, print=print@entry=false) at libbcachefs/replicas.c:949
#8  0x0000555555685fb2 in bch2_write_super (c=c@entry=0x7ffff79dc000) at libbcachefs/super-io.c:985
#9  0x00005555556607d2 in bch2_journal_seq_blacklist_add (c=c@entry=0x7ffff79dc000, start=25735262, end=25735270) at libbcachefs/journal_seq_blacklist.c:115
#10 0x000055555566fc2d in bch2_fs_recovery (c=0x7ffff79dc000) at libbcachefs/recovery.c:839
#11 0x000055555568aa5d in bch2_fs_start (c=c@entry=0x7ffff79dc000) at libbcachefs/super.c:966
#12 0x000055555568d9e3 in bch2_fs_open (devices=devices@entry=0x7fffffffe998, nr_devices=nr_devices@entry=8, opts=...) at libbcachefs/super.c:1956
#13 0x00005555555b1941 in cmd_fsck (argc=8, argv=0x7fffffffe998, argv@entry=0x7fffffffe988) at cmd_fsck.c:94
#14 0x00005555555aa97d in main (argc=<optimized out>, argv=0x7fffffffe988) at bcachefs.c:232
Dikay900 commented 11 months ago

adding output to journal_replay_entry_early

case BCH_JSET_ENTRY_data_usage: {
                struct jset_entry_data_usage *u =
                        container_of(entry, struct jset_entry_data_usage, entry);

                ret = bch2_replicas_set_usage(c, &u->r,
                                              le64_to_cpu(u->v));
                bch_err(c, "adding data usage");
                for (unsigned i = 0; i < u->r.nr_devs; i++) {
                        bch_err(c, "for device %u", u->r.devs[i]);
                }

                break;
        }

I see that this is adding dev 5 during the replay which then triggers the assert afterwards when journal_replay_early is done.

No dev usage is added when adding a similiar output to the case BCH_JSET_ENTRY_dev_usage

Dikay900 commented 11 months ago

updated to bcachefs-for-upstream b20b156ea4331e541ca9268d410e8dc48d1f2d13 and tools to 138397d89212cd0b5abdbfdd644dc7702ea00f4a

cat trace.decoded 

[Sat Nov 25 11:36:34 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,fsck,fix_errors=yes
[Sat Nov 25 11:36:34 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): recovering from unclean shutdown
[Sat Nov 25 11:38:34 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal read done, replaying entries 25734944-25735261
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734945 offset=923/1065: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734946 offset=1216/1358: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734947 offset=956/1098: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734948 offset=714/856: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734949 offset=808/950: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734950 offset=783/925: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734951 offset=1229/1371: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734952 offset=1165/1307: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734953 offset=712/854: bad dev, fixing
[Sat Nov 25 11:38:34 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734954 offset=803/945: bad dev, fixing
[Sat Nov 25 11:38:34 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Ratelimiting new instances of previous error
[Sat Nov 25 11:38:34 2023] BUG: kernel NULL pointer dereference, address: 00000000000000be
[Sat Nov 25 11:38:34 2023] #PF: supervisor read access in kernel mode
[Sat Nov 25 11:38:34 2023] #PF: error_code(0x0000) - not-present page
[Sat Nov 25 11:38:34 2023] PGD 0 P4D 0
[Sat Nov 25 11:38:34 2023] Oops: 0000 [#1] PREEMPT SMP NOPTI
[Sat Nov 25 11:38:34 2023] CPU: 0 PID: 2404 Comm: bcachefs Not tainted 6.7.0-rc2+ #3
[Sat Nov 25 11:38:34 2023] Hardware name: System manufacturer System Product Name/PRIME B450M-A, BIOS 2409 12/02/2020
[Sat Nov 25 11:38:34 2023] RIP: 0010:bch2_have_enough_devs (/mnt/bcachefs/fs/bcachefs/replicas.c:952) 
[Sat Nov 25 11:38:34 2023] Code: 74 7c 31 ff 31 ed 31 d2 89 d0 0f b6 4c 06 03 48 89 c8 48 83 c1 44 49 8b 4c cc 08 48 0f a3 04 24 0f 92 c0 0f b6 c0 01 c5 31 c0 <80> b9 be 00 00 00 02 0f 94 c0 83 c2 01 01 c7 0f b6 46 01 39 c2 72
All code
========
   0:   74 7c                   je     0x7e
   2:   31 ff                   xor    %edi,%edi
   4:   31 ed                   xor    %ebp,%ebp
   6:   31 d2                   xor    %edx,%edx
   8:   89 d0                   mov    %edx,%eax
   a:   0f b6 4c 06 03          movzbl 0x3(%rsi,%rax,1),%ecx
   f:   48 89 c8                mov    %rcx,%rax
  12:   48 83 c1 44             add    $0x44,%rcx
  16:   49 8b 4c cc 08          mov    0x8(%r12,%rcx,8),%rcx
  1b:   48 0f a3 04 24          bt     %rax,(%rsp)
  20:   0f 92 c0                setb   %al
  23:   0f b6 c0                movzbl %al,%eax
  26:   01 c5                   add    %eax,%ebp
  28:   31 c0                   xor    %eax,%eax
  2a:*  80 b9 be 00 00 00 02    cmpb   $0x2,0xbe(%rcx)      <-- trapping instruction
  31:   0f 94 c0                sete   %al
  34:   83 c2 01                add    $0x1,%edx
  37:   01 c7                   add    %eax,%edi
  39:   0f b6 46 01             movzbl 0x1(%rsi),%eax
  3d:   39 c2                   cmp    %eax,%edx
  3f:   72                      .byte 0x72

Code starting with the faulting instruction
===========================================
   0:   80 b9 be 00 00 00 02    cmpb   $0x2,0xbe(%rcx)
   7:   0f 94 c0                sete   %al
   a:   83 c2 01                add    $0x1,%edx
   d:   01 c7                   add    %eax,%edi
   f:   0f b6 46 01             movzbl 0x1(%rsi),%eax
  13:   39 c2                   cmp    %eax,%edx
  15:   72                      .byte 0x72
[Sat Nov 25 11:38:34 2023] RSP: 0018:ffffacfac9bff758 EFLAGS: 00010246
[Sat Nov 25 11:38:34 2023] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[Sat Nov 25 11:38:34 2023] RDX: 0000000000000001 RSI: ffff8c3d4acdc04b RDI: 0000000000000000
[Sat Nov 25 11:38:34 2023] RBP: 0000000000000001 R08: 0000000000000002 R09: 00000000fffffff3
[Sat Nov 25 11:38:34 2023] R10: 0000000000000001 R11: ffff8c3e2a254000 R12: ffff8c3d98680000
[Sat Nov 25 11:38:34 2023] R13: 0000000000000008 R14: 000000000000000c R15: 0000000000000004
[Sat Nov 25 11:38:34 2023] FS:  00007fd7475e2c80(0000) GS:ffff8c483ea00000(0000) knlGS:0000000000000000
[Sat Nov 25 11:38:34 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Nov 25 11:38:34 2023] CR2: 00000000000000be CR3: 000000010acf0000 CR4: 00000000003506f0
[Sat Nov 25 11:38:34 2023] Call Trace:
[Sat Nov 25 11:38:34 2023]  <TASK>
[Sat Nov 25 11:38:34 2023] ? __die (/mnt/bcachefs/arch/x86/kernel/dumpstack.c:421 /mnt/bcachefs/arch/x86/kernel/dumpstack.c:434) 
[Sat Nov 25 11:38:34 2023] ? page_fault_oops (/mnt/bcachefs/arch/x86/mm/fault.c:707) 
[Sat Nov 25 11:38:34 2023] ? __cgroup_account_cputime (/mnt/bcachefs/./arch/x86/include/asm/preempt.h:104 /mnt/bcachefs/kernel/cgroup/rstat.c:395 /mnt/bcachefs/kernel/cgroup/rstat.c:405) 
[Sat Nov 25 11:38:34 2023] ? exc_page_fault (/mnt/bcachefs/./arch/x86/include/asm/paravirt.h:695 /mnt/bcachefs/arch/x86/mm/fault.c:1513 /mnt/bcachefs/arch/x86/mm/fault.c:1561) 
[Sat Nov 25 11:38:34 2023] ? asm_exc_page_fault (/mnt/bcachefs/./arch/x86/include/asm/idtentry.h:570) 
[Sat Nov 25 11:38:34 2023] ? bch2_have_enough_devs (/mnt/bcachefs/fs/bcachefs/replicas.c:952) 
[Sat Nov 25 11:38:34 2023] ? bch2_have_enough_devs (/mnt/bcachefs/./arch/x86/include/asm/preempt.h:104 /mnt/bcachefs/./include/linux/percpu-rwsem.h:70 /mnt/bcachefs/fs/bcachefs/replicas.c:940) 
[Sat Nov 25 11:38:34 2023] ? schedule (/mnt/bcachefs/./arch/x86/include/asm/bitops.h:206 /mnt/bcachefs/./arch/x86/include/asm/bitops.h:238 /mnt/bcachefs/./include/linux/thread_info.h:184 /mnt/bcachefs/./include/linux/sched.h:2263 /mnt/bcachefs/kernel/sched/core.c:6765 /mnt/bcachefs/kernel/sched/core.c:6778) 
[Sat Nov 25 11:38:34 2023] ? __closure_sync (/mnt/bcachefs/lib/closure.c:132) 
[Sat Nov 25 11:38:34 2023] bch2_write_super (/mnt/bcachefs/fs/bcachefs/super-io.c:993) 
[Sat Nov 25 11:38:34 2023] bch2_journal_seq_blacklist_add (/mnt/bcachefs/fs/bcachefs/journal_seq_blacklist.c:117) 
[Sat Nov 25 11:38:34 2023] bch2_fs_recovery (/mnt/bcachefs/fs/bcachefs/recovery.c:802) 
[Sat Nov 25 11:38:34 2023] ? bch2_printbuf_exit (/mnt/bcachefs/fs/bcachefs/printbuf.c:110) 
[Sat Nov 25 11:38:34 2023] ? print_mount_opts (/mnt/bcachefs/fs/bcachefs/super.c:938) 
[Sat Nov 25 11:38:34 2023] bch2_fs_start (/mnt/bcachefs/fs/bcachefs/super.c:971) 
[Sat Nov 25 11:38:34 2023] bch2_fs_open (/mnt/bcachefs/fs/bcachefs/super.c:1961) 
[Sat Nov 25 11:38:34 2023] ? __kmem_cache_alloc_node (/mnt/bcachefs/mm/slub.c:3480 /mnt/bcachefs/mm/slub.c:3517) 
[Sat Nov 25 11:38:34 2023] ? bch2_mount (/mnt/bcachefs/fs/bcachefs/fs.c:1823) 
[Sat Nov 25 11:38:34 2023] bch2_mount (/mnt/bcachefs/fs/bcachefs/fs.c:1823) 
[Sat Nov 25 11:38:34 2023] legacy_get_tree (/mnt/bcachefs/fs/fs_context.c:664) 
[Sat Nov 25 11:38:34 2023] vfs_get_tree (/mnt/bcachefs/fs/super.c:1772) 
[Sat Nov 25 11:38:34 2023] path_mount (/mnt/bcachefs/fs/namespace.c:3337 /mnt/bcachefs/fs/namespace.c:3664) 
[Sat Nov 25 11:38:34 2023] __x64_sys_mount (/mnt/bcachefs/fs/namespace.c:3678 /mnt/bcachefs/fs/namespace.c:3886 /mnt/bcachefs/fs/namespace.c:3863 /mnt/bcachefs/fs/namespace.c:3863) 
[Sat Nov 25 11:38:34 2023] do_syscall_64 (/mnt/bcachefs/arch/x86/entry/common.c:51 /mnt/bcachefs/arch/x86/entry/common.c:82) 
[Sat Nov 25 11:38:34 2023] ? vfs_write (/mnt/bcachefs/./include/linux/fs.h:2020 /mnt/bcachefs/fs/read_write.c:491 /mnt/bcachefs/fs/read_write.c:584) 
[Sat Nov 25 11:38:34 2023] ? fpregs_assert_state_consistent (/mnt/bcachefs/arch/x86/kernel/fpu/context.h:38 /mnt/bcachefs/arch/x86/kernel/fpu/core.c:822) 
[Sat Nov 25 11:38:34 2023] ? exit_to_user_mode_prepare (/mnt/bcachefs/./arch/x86/include/asm/entry-common.h:57 /mnt/bcachefs/kernel/entry/common.c:206) 
[Sat Nov 25 11:38:34 2023] ? syscall_exit_to_user_mode (/mnt/bcachefs/kernel/entry/common.c:299) 
[Sat Nov 25 11:38:34 2023] ? do_syscall_64 (/mnt/bcachefs/./arch/x86/include/asm/cpufeature.h:171 /mnt/bcachefs/arch/x86/entry/common.c:97) 
[Sat Nov 25 11:38:34 2023] entry_SYSCALL_64_after_hwframe (/mnt/bcachefs/arch/x86/entry/entry_64.S:129) 
[Sat Nov 25 11:38:34 2023] RIP: 0033:0x7fd7476f5b7a
[Sat Nov 25 11:38:34 2023] Code: 48 8b 0d 89 82 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 82 0c 00 f7 d8 64 89 01 48
All code
========
   0:   48 8b 0d 89 82 0c 00    mov    0xc8289(%rip),%rcx        # 0xc8290
   7:   f7 d8                   neg    %eax
   9:   64 89 01                mov    %eax,%fs:(%rcx)
   c:   48 83 c8 ff             or     $0xffffffffffffffff,%rax
  10:   c3                      ret
  11:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
  18:   00 00 00 
  1b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  20:   49 89 ca                mov    %rcx,%r10
  23:   b8 a5 00 00 00          mov    $0xa5,%eax
  28:   0f 05                   syscall
  2a:*  48 3d 01 f0 ff ff       cmp    $0xfffffffffffff001,%rax     <-- trapping instruction
  30:   73 01                   jae    0x33
  32:   c3                      ret
  33:   48 8b 0d 56 82 0c 00    mov    0xc8256(%rip),%rcx        # 0xc8290
  3a:   f7 d8                   neg    %eax
  3c:   64 89 01                mov    %eax,%fs:(%rcx)
  3f:   48                      rex.W

Code starting with the faulting instruction
===========================================
   0:   48 3d 01 f0 ff ff       cmp    $0xfffffffffffff001,%rax
   6:   73 01                   jae    0x9
   8:   c3                      ret
   9:   48 8b 0d 56 82 0c 00    mov    0xc8256(%rip),%rcx        # 0xc8266
  10:   f7 d8                   neg    %eax
  12:   64 89 01                mov    %eax,%fs:(%rcx)
  15:   48                      rex.W
[Sat Nov 25 11:38:34 2023] RSP: 002b:00007ffec9372ed8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[Sat Nov 25 11:38:34 2023] RAX: ffffffffffffffda RBX: 000055bd2ddc1000 RCX: 00007fd7476f5b7a
[Sat Nov 25 11:38:34 2023] RDX: 000055bd2ddcbe00 RSI: 000055bd2ddcb680 RDI: 000055bd2ddc1000
[Sat Nov 25 11:38:34 2023] RBP: 000055bd2ddcbe00 R08: 000055bd2ddd0b00 R09: 000055bd2ddc1fe0
[Sat Nov 25 11:38:34 2023] R10: 0000000000000000 R11: 0000000000000206 R12: 000055bd2ddd0b00
[Sat Nov 25 11:38:34 2023] R13: 0000000000000000 R14: 000055bd2ddc91f0 R15: 0000000000000008
[Sat Nov 25 11:38:34 2023]  </TASK>
[Sat Nov 25 11:38:34 2023] Modules linked in: softdog xt_conntrack xt_CHECKSUM nft_chain_nat nf_conntrack_netlink xt_MASQUERADE nf_nat xfrm_user xfrm_algo nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype xt_tcpudp nft_compat br_netfilter bridge stp llc nf_tables nfnetlink overlay zstd zram zsmalloc binfmt_misc nls_ascii nls_cp437 vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm irqbypass ghash_clmulni_intel evdev sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 eeepc_wmi aesni_intel asus_wmi snd_pcm crypto_simd battery snd_timer ledtrig_audio cryptd sparse_keymap rapl platform_profile snd video sp5100_tco wmi_bmof soundcore acpi_cpufreq watchdog pcspkr k10temp ccp rng_core cfg80211 button rfkill sg nfsd auth_rpcgss nfs_acl lockd grace bcache sunrpc drm dm_mod efi_pstore fuse loop configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 ses enclosure scsi_transport_sas btrfs blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear
[Sat Nov 25 11:38:34 2023]  md_mod sd_mod uas t10_pi crc64_rocksoft usb_storage crc_t10dif crct10dif_generic ahci xhci_pci libahci crct10dif_pclmul crct10dif_common libata xhci_hcd r8169 atlantic crc32_pclmul realtek mdio_devres crc32c_intel usbcore libphy macsec scsi_mod i2c_piix4 usb_common scsi_common wmi gpio_amdpt gpio_generic
[Sat Nov 25 11:38:34 2023] CR2: 00000000000000be
[Sat Nov 25 11:38:34 2023] ---[ end trace 0000000000000000 ]---
[Sat Nov 25 11:38:34 2023] RIP: 0010:bch2_have_enough_devs (/mnt/bcachefs/fs/bcachefs/replicas.c:952) 
[Sat Nov 25 11:38:34 2023] Code: 74 7c 31 ff 31 ed 31 d2 89 d0 0f b6 4c 06 03 48 89 c8 48 83 c1 44 49 8b 4c cc 08 48 0f a3 04 24 0f 92 c0 0f b6 c0 01 c5 31 c0 <80> b9 be 00 00 00 02 0f 94 c0 83 c2 01 01 c7 0f b6 46 01 39 c2 72
All code
========
   0:   74 7c                   je     0x7e
   2:   31 ff                   xor    %edi,%edi
   4:   31 ed                   xor    %ebp,%ebp
   6:   31 d2                   xor    %edx,%edx
   8:   89 d0                   mov    %edx,%eax
   a:   0f b6 4c 06 03          movzbl 0x3(%rsi,%rax,1),%ecx
   f:   48 89 c8                mov    %rcx,%rax
  12:   48 83 c1 44             add    $0x44,%rcx
  16:   49 8b 4c cc 08          mov    0x8(%r12,%rcx,8),%rcx
  1b:   48 0f a3 04 24          bt     %rax,(%rsp)
  20:   0f 92 c0                setb   %al
  23:   0f b6 c0                movzbl %al,%eax
  26:   01 c5                   add    %eax,%ebp
  28:   31 c0                   xor    %eax,%eax
  2a:*  80 b9 be 00 00 00 02    cmpb   $0x2,0xbe(%rcx)      <-- trapping instruction
  31:   0f 94 c0                sete   %al
  34:   83 c2 01                add    $0x1,%edx
  37:   01 c7                   add    %eax,%edi
  39:   0f b6 46 01             movzbl 0x1(%rsi),%eax
  3d:   39 c2                   cmp    %eax,%edx
  3f:   72                      .byte 0x72

Code starting with the faulting instruction
===========================================
   0:   80 b9 be 00 00 00 02    cmpb   $0x2,0xbe(%rcx)
   7:   0f 94 c0                sete   %al
   a:   83 c2 01                add    $0x1,%edx
   d:   01 c7                   add    %eax,%edi
   f:   0f b6 46 01             movzbl 0x1(%rsi),%eax
  13:   39 c2                   cmp    %eax,%edx
  15:   72                      .byte 0x72
[Sat Nov 25 11:38:34 2023] RSP: 0018:ffffacfac9bff758 EFLAGS: 00010246
[Sat Nov 25 11:38:34 2023] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[Sat Nov 25 11:38:34 2023] RDX: 0000000000000001 RSI: ffff8c3d4acdc04b RDI: 0000000000000000
[Sat Nov 25 11:38:34 2023] RBP: 0000000000000001 R08: 0000000000000002 R09: 00000000fffffff3
[Sat Nov 25 11:38:34 2023] R10: 0000000000000001 R11: ffff8c3e2a254000 R12: ffff8c3d98680000
[Sat Nov 25 11:38:34 2023] R13: 0000000000000008 R14: 000000000000000c R15: 0000000000000004
[Sat Nov 25 11:38:34 2023] FS:  00007fd7475e2c80(0000) GS:ffff8c483ea00000(0000) knlGS:0000000000000000
[Sat Nov 25 11:38:34 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Nov 25 11:38:34 2023] CR2: 00000000000000be CR3: 000000010acf0000 CR4: 00000000003506f0
[Sat Nov 25 11:38:34 2023] note: bcachefs[2404] exited with irqs disabled
koverstreet commented 11 months ago

So we seem to have a replicas entry that points to an invalid device. Which is odd - it doesn't look like the bad entry was in the superblock.

Could you try with the replicas_debug branch? I've added an extra assertion in the path that adds new cpu replicas entries.

koverstreet commented 11 months ago

When you run that branch you'll need to run it under gdb to get a backtrace.

Dikay900 commented 11 months ago

There you go:

bcachefs: libbcachefs/replicas.c:178: cpu_replicas_add_entry: Assertion `!(!bch2_dev_exists2(c, new_entry->devs[i]))' failed.

Thread 1 "bcachefs" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44  ./nptl/pthread_kill.c: No such file or directory.
(gdb) backtrace
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007ffff7b39d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007ffff7aeaf32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7ad5472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x00007ffff7ad5395 in __assert_fail_base (fmt=0x7ffff7c49a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x5555558bf440 "!(!bch2_dev_exists2(c, new_entry->devs[i]))", file=file@entry=0x5555558bf006 "libbcachefs/replicas.c", line=line@entry=178, 
    function=function@entry=0x5555558bf710 <__PRETTY_FUNCTION__.7> "cpu_replicas_add_entry") at ./assert/assert.c:92
#5  0x00007ffff7ae3e32 in __GI___assert_fail (assertion=assertion@entry=0x5555558bf440 "!(!bch2_dev_exists2(c, new_entry->devs[i]))", 
    file=file@entry=0x5555558bf006 "libbcachefs/replicas.c", line=line@entry=178, function=function@entry=0x5555558bf710 <__PRETTY_FUNCTION__.7> "cpu_replicas_add_entry")
    at ./assert/assert.c:101
#6  0x000055555567594f in cpu_replicas_add_entry (c=c@entry=0x7ffff79dc000, old=old@entry=0x7ffff79dc420, new_entry=new_entry@entry=0x7fffec0cf6b0) at libbcachefs/replicas.c:178
#7  0x00005555556773fa in bch2_replicas_set_usage (c=c@entry=0x7ffff79dc000, r=r@entry=0x7fffec0cf6b0, sectors=0) at libbcachefs/replicas.c:605
#8  0x000055555567119b in journal_replay_entry_early (c=c@entry=0x7ffff79dc000, entry=entry@entry=0x7fffec0cf6a0) at libbcachefs/recovery.c:306
#9  0x0000555555672c5d in journal_replay_early (clean=<optimized out>, c=0x7ffff79dc000) at libbcachefs/recovery.c:383
#10 bch2_fs_recovery (c=0x7ffff79dc000) at libbcachefs/recovery.c:839
#11 0x000055555568e10d in bch2_fs_start (c=c@entry=0x7ffff79dc000) at libbcachefs/super.c:969
#12 0x000055555569104b in bch2_fs_open (devices=devices@entry=0x7fffffffe9d8, nr_devices=nr_devices@entry=8, opts=...) at libbcachefs/super.c:1959
#13 0x00005555555b1a31 in cmd_fsck (argc=8, argv=0x7fffffffe9d8, argv@entry=0x7fffffffe9c8) at cmd_fsck.c:94
#14 0x00005555555aa9a7 in main (argc=<optimized out>, argv=0x7fffffffe9c8) at bcachefs.c:234
koverstreet commented 11 months ago

yep, that found the missing validation

the replicas_debug branch has been updated, tell me if that works

Dikay900 commented 11 months ago

Thank you that works:

[Sun Nov 26 09:08:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): mounting version 1.3: rebalance_work opts=metadata_replicas=2,data_replicas=2,metadata_target=fast,foreground_target=fast,background_target=hdd,promote_target=fast,fsck,fix_errors=yes
[Sun Nov 26 09:08:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): recovering from unclean shutdown
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal read done, replaying entries 25734944-25735261
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734944 offset=1728/2151: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734944 offset=2009/2151: bad dev, fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734945 offset=642/1065: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734945 offset=923/1065: bad dev, fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734946 offset=935/1358: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734946 offset=1216/1358: bad dev, fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734947 offset=675/1098: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734947 offset=956/1098: bad dev, fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734948 offset=433/856: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=dev_usage seq=25734948 offset=714/856: bad dev, fixing
[Sun Nov 26 09:10:10 2023] invalid journal entry, version=1.3: rebalance_work type=data_usage seq=25734949 offset=527/950: invalid journal entry usage: invalid device 5 in entry journal: 1/2 [4 5], fixing
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Ratelimiting new instances of previous error
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): alloc_read... done
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): stripes_read... done
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): snapshots_read... done
[Sun Nov 26 09:10:10 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_allocations...  done
[Sun Nov 26 09:24:48 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): journal_replay...
[Sun Nov 26 09:24:48 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): going read-write  done
[Sun Nov 26 09:24:50 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_alloc_info... done
[Sun Nov 26 09:25:16 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_lrus... done
[Sun Nov 26 09:25:22 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_btree_backpointers...
[Sun Nov 26 09:36:04 2023]  done
[Sun Nov 26 09:36:04 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_backpointers_to_extents...  done
[Sun Nov 26 10:05:24 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_extents_to_backpointers... done
[Sun Nov 26 10:17:30 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_alloc_to_lru_refs... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_snapshot_trees... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_snapshots... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_subvols... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): delete_dead_snapshots... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): resume_logged_ops... done
[Sun Nov 26 10:17:51 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_inodes... done
[Sun Nov 26 10:18:06 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_extents...
[Sun Nov 26 10:18:55 2023] inode 141010656:4294967295 has incorrect i_sectors: got 2597304, should be 2597200, fixing
[Sun Nov 26 10:18:55 2023] inode 141011051:4294967295 has incorrect i_sectors: got 2534976, should be 2534920, fixing
[Sun Nov 26 10:18:55 2023] inode 141011067:4294967295 has incorrect i_sectors: got 4375512, should be 4375464, fixing
[Sun Nov 26 10:18:56 2023] inode 141011319:4294967295 has incorrect i_sectors: got 3558888, should be 3558840, fixing
[Sun Nov 26 10:18:56 2023] inode 141011524:4294967295 has incorrect i_sectors: got 22248072, should be 22247960, fixing
[Sun Nov 26 10:18:57 2023] inode 141011550:4294967295 has incorrect i_sectors: got 26808424, should be 26808416, fixing
[Sun Nov 26 10:18:57 2023] inode 141011594:4294967295 has incorrect i_sectors: got 753640, should be 753600, fixing
[Sun Nov 26 10:18:57 2023] inode 141011630:4294967295 has incorrect i_sectors: got 3679112, should be 3679008, fixing
[Sun Nov 26 10:18:57 2023] inode 141011631:4294967295 has incorrect i_sectors: got 3679168, should be 3679128, fixing
[Sun Nov 26 10:18:57 2023] inode 141011661:4294967295 has incorrect i_sectors: got 3889336, should be 3889296, fixing
[Sun Nov 26 10:18:57 2023] inode 141011837:4294967295 has incorrect i_sectors: got 3793104, should be 3792992, fixing
[Sun Nov 26 10:18:57 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): Ratelimiting new instances of previous error
[Sun Nov 26 10:19:45 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_indirect_extents... done
[Sun Nov 26 10:19:46 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_dirents... done
[Sun Nov 26 10:19:55 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_xattrs... done
[Sun Nov 26 10:19:55 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_root... done
[Sun Nov 26 10:19:55 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_directory_structure... done
[Sun Nov 26 10:20:12 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): check_nlinks... done
[Sun Nov 26 10:20:28 2023] bcachefs (f0605888-9e17-4fec-bef6-92e885adc371): delete_dead_inodes... done

Thank you very much for your work!