koverstreet / bcachefs

Other
686 stars 71 forks source link

hit (probably) snapshot related error [59180b370b79] #593

Closed noradtux closed 1 year ago

noradtux commented 1 year ago

My backup script tends to trigger an error(=panic). It creates (ro) snapshots of all relevant subvolumes, then bind-mounts those into a temporary tree and uses restic to backup into off-site storage.

Kernel is bcachefs 59180b370b79 + linux-stable 6.5.3.

bcachefs fs usage /media/LXSTORE
Filesystem: c8a781ec-2b78-41ea-a188-326954513c0e
Size:                 65950295145984
Used:                 19518939275264
Online reserved:                   0

Data type       Required/total  Devices
reserved:       1/0                    [] 24858624
reserved:       1/1                 [] 76873654272
btree:          1/2             [nvme0n1p1 nvme1n1p1]   143929114624
user:           1/2             [sda1 sdc1]            3936472475648
user:           1/1             [sdd1]                 1044538811904
user:           1/2             [sdc1 sdd1]             503250580480
user:           1/2             [sdb1 sdc1]            3106769455104
user:           1/1             [sdb1]                 1044538515456
user:           1/1             [nvme0n1p1]                   147456
user:           1/2             [sda1 sdb1]             503246327808
user:           1/2             [sda1 sdd1]            3106764368896
user:           1/2             [sdb1 sdd1]            3936461952000
user:           1/2             [nvme0n1p1 nvme1n1p1]        1125376
user:           1/1             [sda1]                 1044538113024
user:           1/1             [sdc1]                 1044536411136
cached:         1/1             [nvme0n1p1]            2813321213440
cached:         1/1             [nvme1n1p1]            2813479301120

hdd.sda (device 0):             sda1              rw
                                data         buckets    fragmented
  free:                            0        21321831
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                           0               0
  user:                4817779699200         9189262      40177664
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:           16000898564096        30519292

hdd.sdb (device 1):             sdb1              rw
                                data         buckets    fragmented
  free:                            0        21321832
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                           0               0
  user:                4817777382912         9189261      41965568
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:           16000898564096        30519292

hdd.sdc (device 2):             sdc1              rw
                                data         buckets    fragmented
  free:                            0        21321837
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                           0               0
  user:                4817782666752         9189256      34061312
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:           16000898564096        30519292

hdd.sdd (device 3):             sdd1              rw
                                data         buckets    fragmented
  free:                            0        21321835
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                           0               0
  user:                4817777262592         9189258      40516608
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  erasure coded:                   0               0
  capacity:           16000898564096        30519292

ssd.nvme0n1 (device 4):    nvme0n1p1              rw
                                data         buckets    fragmented
  free:                            0         1747330
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                 71964557312          202701   34309144576
  user:                       710144              16       7678464
  cached:              2813321213440         5367407
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               5
  erasure coded:                   0               0
  capacity:            3840754581504         7325658

ssd.nvme1n1 (device 5):    nvme1n1p1              rw
                                data         buckets    fragmented
  free:                            0         1747366
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                 71964557312          202701   34309144576
  user:                       562688              15       7301632
  cached:              2813479301120         5367372
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               5
  erasure coded:                   0               0
  capacity:            3840754581504         7325658
bcachefs show-super /dev/sda1
External UUID:                              c8a781ec-2b78-41ea-a188-326954513c0e
Internal UUID:                              7aceb175-72f0-44d6-8e38-e937b458f90b
Device index:                               0
Label:                                      LXSTORE
Version:                                    1.2: deleted_inodes
Version upgrade complete:                   1.2: deleted_inodes
Oldest version on disk:                     1.2: deleted_inodes
Created:                                    Sat Sep 16 11:45:33 2023
Sequence number:                            46
Superblock size:                            6688
Clean:                                      0
Devices:                                    6
Sections:                                   members,replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters
Features:                                   lz4,zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                            alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                               4.00 KiB
  btree_node_size:                          256 KiB
  errors:                                   continue [ro] panic
  metadata_replicas:                        2
  data_replicas:                            2
  metadata_replicas_required:               1
  data_replicas_required:                   1
  encoded_extent_max:                       64.0 KiB
  metadata_checksum:                        none [crc32c] crc64 xxhash
  data_checksum:                            none crc32c crc64 [xxhash]
  compression:                              lz4:1
  background_compression:                   zstd
  str_hash:                                 crc32c crc64 [siphash]
  metadata_target:                          none
  foreground_target:                        ssd
  background_target:                        hdd
  promote_target:                           ssd
  erasure_code:                             0
  inodes_32bit:                             1
  shard_inode_numbers:                      1
  inodes_use_key_cache:                     1
  gc_reserve_percent:                       8
  gc_reserve_bytes:                         0 B
  root_reserve_percent:                     0
  wide_macs:                                0
  acl:                                      1
  usrquota:                                 0
  grpquota:                                 0
  prjquota:                                 0
  journal_flush_delay:                      1000
  journal_flush_disabled:                   0
  journal_reclaim_delay:                    100
  journal_transaction_names:                1
  version_upgrade:                          [compatible] incompatible none
  nocow:                                    0

members (size 344):
  Device:                                   0
    UUID:                                   517ce52a-6c4f-4b43-b5c5-052b5ad749a1
    Size:                                   14.6 TiB
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                30519292
    Last mount:                             Tue Sep 19 11:19:12 2023
    State:                                  rw
    Label:                                  sda (1)
    Data allowed:                           journal,btree,user
    Has data:                               journal,user
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   1
    UUID:                                   b762e0de-d6f0-41dc-a605-74f63ebc52d4
    Size:                                   14.6 TiB
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                30519292
    Last mount:                             Tue Sep 19 11:19:12 2023
    State:                                  rw
    Label:                                  sdb (2)
    Data allowed:                           journal,btree,user
    Has data:                               journal,user
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   2
    UUID:                                   962ec0a9-5b58-4ce9-a2cc-588cebd4fc5d
    Size:                                   14.6 TiB
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                30519292
    Last mount:                             Tue Sep 19 11:19:12 2023
    State:                                  rw
    Label:                                  sdc (3)
    Data allowed:                           journal,btree,user
    Has data:                               user
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   3
    UUID:                                   d8c35b75-b3d3-471a-a782-8fe77f7a9ddb
    Size:                                   14.6 TiB
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                30519292
    Last mount:                             Tue Sep 19 11:19:12 2023
    State:                                  rw
    Label:                                  sdd (4)
    Data allowed:                           journal,btree,user
    Has data:                               user
    Discard:                                0
    Freespace initialized:                  1
  Device:                                   4
    UUID:                                   7bd857aa-f6ea-4e30-b86b-c93c6ca82938
    Size:                                   3.49 TiB
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                7325658
    Last mount:                             Tue Sep 19 11:19:12 2023
    State:                                  rw
    Label:                                  nvme0n1 (6)
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,cached
    Discard:                                1
    Freespace initialized:                  1
  Device:                                   5
    UUID:                                   c94503f0-f068-481b-a413-47cfe12f9186
    Size:                                   3.49 TiB
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                7325658
    Last mount:                             Tue Sep 19 11:19:12 2023
    State:                                  rw
    Label:                                  nvme1n1 (7)
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user,cached
    Discard:                                1
    Freespace initialized:                  1

From pstore:

<3>[12615.247548] invalid bkey on insert from bch2_delete_dead_snapshots -> move_key_to_correct_snapshot
<3>[12615.247550]   u64s 5 type whiteout 1342559904:203320:0 len 0 ver 0
<3>[12615.247551]   snapshot == 0
<0>[12615.247623] Kernel panic - not syncing: bcachefs (c8a781ec-2b78-41ea-a188-326954513c0e): panic after error
<4>[12615.247662] CPU: 3 PID: 47433 Comm: kworker/3:1 Not tainted 6.5.3-gateway #1
<4>[12615.247690] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Phantom Gaming 4, BIOS P4.20 08/02/2021
<4>[12615.247734] Workqueue: bcachefs_write_ref bch2_delete_dead_snapshots_work
<4>[12615.247762] Call Trace:
<4>[12615.247773]  <TASK>
<4>[12615.247782]  dump_stack_lvl+0x5a/0x90
<4>[12615.247795]  panic+0x10c/0x300
<4>[12615.247809]  bch2_inconsistent_error+0x6f/0x80
<4>[12615.247826]  bch2_trans_commit_bkey_invalid+0xb3/0xd0
<4>[12615.247847]  __bch2_trans_commit+0x32e/0x1f80
<4>[12615.247867]  bch2_delete_dead_snapshots+0xcd1/0x19a0
<4>[12615.247886]  ? srso_return_thunk+0x5/0x10
<4>[12615.247905]  ? bch2_delete_dead_snapshots+0xb5e/0x19a0
<4>[12615.247924]  bch2_delete_dead_snapshots_work+0x21/0xf0
<4>[12615.247944]  process_one_work+0x20f/0x3f0
<4>[12615.247961]  worker_thread+0x25d/0x460
<4>[12615.247976]  ? __pfx_worker_thread+0x10/0x10
<4>[12615.247994]  kthread+0xe6/0x100
<4>[12615.248007]  ? __pfx_kthread+0x10/0x10
<4>[12615.248022]  ret_from_fork+0x35/0x40
<4>[12615.248037]  ? __pfx_kthread+0x10/0x10
<4>[12615.248052]  ret_from_fork_asm+0x1b/0x30
<4>[12615.248070]  </TASK>
<0>[12615.248365] Kernel Offset: 0x1a200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
bcachefs fsck -n /dev/sd{a,b,c,d}1 /dev/nvme{0,1}n1p1    :(
mounting version 1.2: deleted_inodes opts=metadata_replicas=2,data_replicas=2,data_checksum=xxhash,compression=lz4:1,background_compression=zstd,foreground_target=ssd,background_target=hdd,promote_target=ssd,degraded,fsck,fix_errors=no,nochanges
recovering from clean shutdown, journal seq 2000770
journal read done, replaying entries 2000770-2000770
alloc_read... done
stripes_read... done
snapshots_read... done
check_allocations...error validating btree node on 0x560861e76b10g at btree lru level 0/1
  u64s 12 type btree_ptr_v2 18446462599806582784:1407374883656835:0 len 0 ver 0: seq b3c6d01eac962a2e written 496 min_key 18446462599806582784:1407374883586297:1 durability: 2 ptr: 4:201535:512 gen 8 ptr: 5:199537:512 gen 9
  node offset 0: bad magic: want 4657ac0acc2fc98f, got 0
retrying read
error validating btree node on 0x560861e76570g at btree lru level 0/1
  u64s 12 type btree_ptr_v2 18446462599806582784:1407374883656835:0 len 0 ver 0: seq b3c6d01eac962a2e written 496 min_key 18446462599806582784:1407374883586297:1 durability: 2 ptr: 4:201535:512 gen 8 ptr: 5:199537:512 gen 9
  node offset 0: bad magic: want 4657ac0acc2fc98f, got 0
running explicit recovery pass check_topology (4), currently at check_allocations (5)
retry success
Unreadable btree node at btree lru level 0:
  u64s 12 type btree_ptr_v2 18446462599806582784:1407374883656835:0 len 0 ver 0: seq b3c6d01eac962a2e written 496 min_key 18446462599806582784:1407374883586297:1 durability: 2 ptr: 4:201535:512 gen 8 ptr: 5:199537:512 gen 9, not fixing
error validating btree node on 0x560861e76570g at btree backpointers level 0/2
  u64s 12 type btree_ptr_v2 4:8719302656:0 len 0 ver 0: seq d2795a4a0044b445 written 336 min_key 3:9643912962048:1 durability: 2 ptr: 4:187848:0 gen 9 ptr: 5:189251:0 gen 12
  node offset 0: got wrong btree node (seq fe8b3aa895c88517 want d2795a4a0044b445)
retrying read
error validating btree node on 0x560861e76b10g at btree backpointers level 0/2
  u64s 12 type btree_ptr_v2 4:8719302656:0 len 0 ver 0: seq d2795a4a0044b445 written 336 min_key 3:9643912962048:1 durability: 2 ptr: 4:187848:0 gen 9 ptr: 5:189251:0 gen 12
  node offset 0: got wrong btree node (seq fe8b3aa895c88517 want d2795a4a0044b445)
running explicit recovery pass check_topology (4), currently at check_topology (4)
retry success
Unreadable btree node at btree backpointers level 0:
  u64s 12 type btree_ptr_v2 4:8719302656:0 len 0 ver 0: seq d2795a4a0044b445 written 336 min_key 3:9643912962048:1 durability: 2 ptr: 4:187848:0 gen 9 ptr: 5:189251:0 gen 12, not fixing
error validating btree node on 0x560861e76b10g at btree backpointers level 0/2
  u64s 12 type btree_ptr_v2 5:373726994432:0 len 0 ver 0: seq ce11d43023c5df65 written 504 min_key 5:359491829760:1 durability: 2 ptr: 4:199625:0 gen 15 ptr: 5:199204:0 gen 11
  node offset 0: got wrong btree node (seq f0fe91d340786526 want ce11d43023c5df65)
retrying read
error validating btree node on 0x560861e76570g at btree backpointers level 0/2
  u64s 12 type btree_ptr_v2 5:373726994432:0 len 0 ver 0: seq ce11d43023c5df65 written 504 min_key 5:359491829760:1 durability: 2 ptr: 4:199625:0 gen 15 ptr: 5:199204:0 gen 11
  node offset 0: bad magic: want 4657ac0acc2fc98f, got 0
running explicit recovery pass check_topology (4), currently at check_topology (4)
retry success
Unreadable btree node at btree backpointers level 0:
  u64s 12 type btree_ptr_v2 5:373726994432:0 len 0 ver 0: seq ce11d43023c5df65 written 504 min_key 5:359491829760:1 durability: 2 ptr: 4:199625:0 gen 15 ptr: 5:199204:0 gen 11, not fixing
error validating btree node on 0x560861e76570g at btree backpointers level 0/2
  u64s 12 type btree_ptr_v2 5:3848611823616:0 len 0 ver 0: seq 7a8c09a2fb41565b written 496 min_key 5:3823239692288:1 durability: 2 ptr: 5:3684822:512 gen 0 ptr: 4:3684739:512 gen 0
  node offset 0: bad magic: want 4657ac0acc2fc98f, got 0
retrying read
error validating btree node on 0x560861e76b10g at btree backpointers level 0/2
  u64s 12 type btree_ptr_v2 5:3848611823616:0 len 0 ver 0: seq 7a8c09a2fb41565b written 496 min_key 5:3823239692288:1 durability: 2 ptr: 5:3684822:512 gen 0 ptr: 4:3684739:512 gen 0
  node offset 0: bad magic: want 4657ac0acc2fc98f, got 0
running explicit recovery pass check_topology (4), currently at check_topology (4)
retry success
Unreadable btree node at btree backpointers level 0:
  u64s 12 type btree_ptr_v2 5:3848611823616:0 len 0 ver 0: seq 7a8c09a2fb41565b written 496 min_key 5:3823239692288:1 durability: 2 ptr: 5:3684822:512 gen 0 ptr: 4:3684739:512 gen 0, not fixing
error validating btree node on 0x560861e76570g at btree bucket_gens level 0/1
  u64s 12 type btree_ptr_v2 4:905:0 len 0 ver 0: seq 730ea75d4a49bd85 written 480 min_key 4:465:1 durability: 2 ptr: 4:326947:512 gen 12 ptr: 5:326888:512 gen 12
  node offset 0: got wrong btree node (seq 26fe5c68fd632d0c want 730ea75d4a49bd85)
retrying read
error validating btree node on 0x560861e76b10g at btree bucket_gens level 0/1
  u64s 12 type btree_ptr_v2 4:905:0 len 0 ver 0: seq 730ea75d4a49bd85 written 480 min_key 4:465:1 durability: 2 ptr: 4:326947:512 gen 12 ptr: 5:326888:512 gen 12
  node offset 0: bad magic: want 4657ac0acc2fc98f, got 0
running explicit recovery pass check_topology (4), currently at check_topology (4)
retry success
Unreadable btree node at btree bucket_gens level 0:
  u64s 12 type btree_ptr_v2 4:905:0 len 0 ver 0: seq 730ea75d4a49bd85 written 480 min_key 4:465:1 durability: 2 ptr: 4:326947:512 gen 12 ptr: 5:326888:512 gen 12, not fixing
error validating btree node on 0x560861e76570g at btree alloc level 0/2
  u64s 12 type btree_ptr_v2 1:531:0 len 0 ver 0: seq 68c5338ede7eb3b5 written 392 min_key 0:9196399:1 durability: 2 ptr: 5:188779:512 gen 13 ptr: 4:189690:512 gen 11
  node offset 0: bad magic: want 4657ac0acc2fc98f, got 0
retrying read
error validating btree node on 0x560861e76b10g at btree alloc level 0/2
  u64s 12 type btree_ptr_v2 1:531:0 len 0 ver 0: seq 68c5338ede7eb3b5 written 392 min_key 0:9196399:1 durability: 2 ptr: 5:188779:512 gen 13 ptr: 4:189690:512 gen 11
  node offset 0: bad magic: want 4657ac0acc2fc98f, got 0
running explicit recovery pass check_topology (4), currently at check_topology (4)
retry success
error writing alloc info: Input/output error
bch2_gc(): error Input/output error
bch2_fs_recovery(): error Input/output error
error starting filesystem: Input/output error
error opening /dev/sda1: Input/output error
noradtux commented 1 year ago

Removing all snapshots seems to have resolved the immediate effect of this issue.

noradtux commented 1 year ago

And there the issue hits again. Made it to 4 daily snapshots before the error appeared again.

koverstreet commented 1 year ago

Fixed by c0909db6c3fd bcachefs: Make sure to initialize equiv when creating new snapshots, currently in the testing branch.