koverstreet / bcachefs

Other
643 stars 71 forks source link

Large (100-200GB) files reliably get corrupted #619

Open chayleaf opened 7 months ago

chayleaf commented 7 months ago

I've tried:

In both cases, the files got corrupted. Monero reported wrong blockchain hashes (it worked fine on btrfs), Windows failed to boot and booted into recovery, and I couldn't proceed further.

The VM state diverged as soon as I launched it so I can't really compare the files without deep digging into the image structure, which is a shame.

However, I have a backup of the image which I haven't touched, and the hash seems to be the same. Maybe it's related to the technique these programs use? (e.g mmap)

koverstreet commented 7 months ago

Interesting - I think we should isolate as many variables as we can first; I would be surprised if it's related to mmap'd IO.

If you can pin it down to an exact inode:offset we can search through the journal to see what transaction last updated that data.

Can you try without zstd? Also, post your show-super output, and describe anything else about your system configuration that might be relevant.

chayleaf commented 7 months ago
# uname -a
Linux nixmsi 6.7.0-rc2 #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan  1 00:00:00 UTC 1980 x86_64 GNU/Linux

# bcachefs show-super /dev/mapper/vg0-root
External UUID:                              dc669123-d6d3-447f-9ce3-c22587e5fa6a
Internal UUID:                              93ebea48-bf34-4077-92a8-4a08e3648d67
Device index:                               0
Label:                                      
Version:                                    1.3: rebalance_work
Version upgrade complete:                   1.3: rebalance_work
Oldest version on disk:                     1.3: rebalance_work
Created:                                    Thu Dec  7 16:17:56 2023

Sequence number:                            56
Superblock size:                            4448
Clean:                                      0
Devices:                                    1
Sections:                                   members_v1,replicas_v0,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors
Features:                                   zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                            alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                               512 B
  btree_node_size:                          256 KiB
  errors:                                   continue [ro] panic 
  metadata_replicas:                        1
  data_replicas:                            1
  metadata_replicas_required:               1
  data_replicas_required:                   1
  encoded_extent_max:                       64.0 KiB
  metadata_checksum:                        none [crc32c] crc64 xxhash 
  data_checksum:                            none [crc32c] crc64 xxhash 
  compression:                              zstd:15
  background_compression:                   none
  str_hash:                                 crc32c crc64 [siphash] 
  metadata_target:                          none
  foreground_target:                        none
  background_target:                        none
  promote_target:                           none
  erasure_code:                             0
  inodes_32bit:                             1
  shard_inode_numbers:                      1
  inodes_use_key_cache:                     1
  gc_reserve_percent:                       8
  gc_reserve_bytes:                         0 B
  root_reserve_percent:                     0
  wide_macs:                                0
  acl:                                      1
  usrquota:                                 0
  grpquota:                                 0
  prjquota:                                 0
  journal_flush_delay:                      1000
  journal_flush_disabled:                   0
  journal_reclaim_delay:                    100
  journal_transaction_names:                1
  version_upgrade:                          [compatible] incompatible none 
  nocow:                                    0

members_v2 (size 136):
  Device:                                   0
    Label:                                  (none)
    UUID:                                   67da1c81-5818-4a92-bbbd-9a8ca1234129
    Size:                                   829 GiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                1697312
    Last mount:                             Mon Dec 11 05:54:20 2023

    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user
    Discard:                                0
    Freespace initialized:                  1

replicas_v0 (size 24):
  btree: 1 [0] journal: 1 [0] user: 1 [0]
# uname -a
Linux server 6.7.0-rc2 #1-NixOS SMP Tue Jan  1 00:00:00 UTC 1980 aarch64 GNU/Linux
# bcachefs show-super /dev/mapper/bch0
External UUID:                              088a3d70-b54c-4437-8e01-feda6bfb7236
Internal UUID:                              90608ffd-3a27-4e50-98ab-e852ceb23aa0
Device index:                               1
Label:                                      
Version:                                    1.3: rebalance_work
Version upgrade complete:                   1.3: rebalance_work
Oldest version on disk:                     1.3: rebalance_work
Created:                                    Fri Nov 24 18:58:28 2023

Sequence number:                            53
Superblock size:                            5880
Clean:                                      0
Devices:                                    3
Sections:                                   members_v1,replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors
Features:                                   zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:                            alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done

Options:
  block_size:                               512 B
  btree_node_size:                          256 KiB
  errors:                                   continue [ro] panic 
  metadata_replicas:                        3
  data_replicas:                            3
  metadata_replicas_required:               1
  data_replicas_required:                   1
  encoded_extent_max:                       64.0 KiB
  metadata_checksum:                        none [crc32c] crc64 xxhash 
  data_checksum:                            none [crc32c] crc64 xxhash 
  compression:                              zstd
  background_compression:                   none
  str_hash:                                 crc32c crc64 [siphash] 
  metadata_target:                          none
  foreground_target:                        ssd
  background_target:                        none
  promote_target:                           ssd
  erasure_code:                             0
  inodes_32bit:                             1
  shard_inode_numbers:                      1
  inodes_use_key_cache:                     1
  gc_reserve_percent:                       8
  gc_reserve_bytes:                         0 B
  root_reserve_percent:                     0
  wide_macs:                                0
  acl:                                      1
  usrquota:                                 0
  grpquota:                                 0
  prjquota:                                 0
  journal_flush_delay:                      1000
  journal_flush_disabled:                   0
  journal_reclaim_delay:                    100
  journal_transaction_names:                1
  version_upgrade:                          [compatible] incompatible none 
  nocow:                                    0

members_v2 (size 376):
  Device:                                   0
    Label:                                  ssd1 (1)
    UUID:                                   f894a6be-bb75-456b-baa8-412752f458ad
    Size:                                   1.86 TiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                3906996
    Last mount:                             Sun Sep 17 02:39:37 2023

    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user
    Discard:                                1
    Freespace initialized:                  1
  Device:                                   1
    Label:                                  ssd2 (2)
    UUID:                                   20a92e95-78ea-472d-aa96-f1cf6719476c
    Size:                                   1.86 TiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                3906996
    Last mount:                             Sun Sep 17 02:39:37 2023

    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user
    Discard:                                1
    Freespace initialized:                  1
  Device:                                   2
    Label:                                  ssd3 (3)
    UUID:                                   85892d1f-940b-4dfb-ad29-070d56545e20
    Size:                                   1.86 TiB
    read errors:                            0
    write errors:                           0
    checksum errors:                        0
    seqread iops:                           0
    seqwrite iops:                          0
    randread iops:                          0
    randwrite iops:                         0
    Bucket size:                            512 KiB
    First bucket:                           0
    Buckets:                                3906996
    Last mount:                             Sun Sep 17 02:39:37 2023

    State:                                  rw
    Data allowed:                           journal,btree,user
    Has data:                               journal,btree,user
    Discard:                                1
    Freespace initialized:                  1

replicas_v0 (size 72):
  btree: 3 [0 1 2] journal: 3 [0 1 2] user: 2 [0 1] journal: 2 [0 2] btree: 2 [0 2] user: 1 [1] user: 2 [1 2] journal: 2 [0 1] journal: 2 [1 2] btree: 2 [0 1] btree: 2 [1 2] user: 1 [0] user: 1 [2] user: 2 [0 2] user: 3 [0 1 2]

I will try booting the VM with no image compression and report back.

chayleaf commented 7 months ago

I've now got a PostgreSQL xlog flush request not satisfied on the arm64 machine. In this case PostgreSQL did print a file path and offset, how do I collect the debug info for you? (PostgreSQL did have compression enabled)

Additionally, the VM booted fine off an uncompressed image, which means it may be related to compression (though it could still be a fluke)

koverstreet commented 7 months ago

Can you run for awhile longer with compression off, so we get more confirmation?

For the postgresql log: grab the inode number, with ls -li.

Then search through the extents btree; you'll probably want to do this with the filesystem unmounted, so that you can use the 'bcachefs list' command and give it a start position.

The offset in the start position is in units of 512 byte sectors, so divide your byte offset by 512.

bcachefs list -b extents -s inode_nr:offset /dev/sda /dev/sdb ...

Grab the first 10 or so extents from that and post them, that will tell us a little bit.

Given that it seems to be compression related we can probably dig off on searching through the journal for now, so let's try to confirm that. Checking it it reproduces with lz4 would also be very useful.

chayleaf commented 7 months ago

Relevant PostgreSQL logs:

FATAL:  the database system is not yet accepting connections
DETAIL:  Consistent recovery state has not been yet reached.
LOG:  request to flush past end of generated WAL; request 6/100E7DE8, current position 5/A9B34140
CONTEXT:  writing block 0 of relation base/60826/61326_vm
ERROR:  xlog flush request 6/100E7DE8 is not satisfied --- flushed only to 5/A9B34140
CONTEXT:  writing block 0 of relation base/60826/61326_vm
FATAL:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.
LOG:  startup process (PID 3598085) exited with exit code 1

Unfortunately, I don't know enough about PostgreSQL internals to tell whether this file (base/60826/61326_vm) is indeed what I'm looking for, so just in case I'm also recording base/60826/61326_fsm (and I don't know if it's relevant here either):

1612853633 -rw-r----- 1 postgres postgres 163840 Dec 15 05:27 base/60826/61326_fsm
1612853681 -rw-r----- 1 postgres postgres  24576 Dec 14 15:48 base/60826/61326_vm
# bcachefs list -b extents -s 1612853681:0 /dev/mapper/bch* | head -n50
mounting version 1.3: rebalance_work opts=errors=continue,metadata_replicas=3,data_replicas=3,compression=zstd,foreground_target=ssd,promote_target=ssd,degraded,nochanges,norecovery
recovering from clean shutdown, journal seq 20119310
alloc_read... done
stripes_read... done
snapshots_read... done
u64s 9 type extent 1612853681:16:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:694963:493 gen 5 ptr: 0:694909:449 gen 1 ptr: 1:671372:87 gen 2
u64s 9 type extent 1612853681:32:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 32 offset 16 nonce 0 csum crc32c compress zstd ptr: 2:694868:917 gen 1 ptr: 0:694852:873 gen 1 ptr: 1:694440:511 gen 1
u64s 9 type extent 1612853681:48:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:693622:638 gen 1 ptr: 2:693680:638 gen 1 ptr: 1:693290:332 gen 0
u64s 9 type extent 1612853686:16:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:474700:642 gen 4 ptr: 0:483518:642 gen 10 ptr: 2:483265:85 gen 10
u64s 9 type extent 1612853687:16:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:150280:656 gen 10 ptr: 0:232006:584 gen 11 ptr: 2:154573:249 gen 2
u64s 9 type extent 1612853694:16:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:61650:556 gen 5 ptr: 2:27136:556 gen 4 ptr: 1:18438:381 gen 13
u64s 9 type extent 1612853718:32:U32_MAX len 32 ver 0: durability: 3 crc: c_size 1 size 32 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:237321:525 gen 6 ptr: 2:235411:401 gen 6 ptr: 0:234936:401 gen 6
u64s 9 type extent 1612853728:32:U32_MAX len 32 ver 0: durability: 3 crc: c_size 1 size 32 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:238109:236 gen 7 ptr: 2:237274:112 gen 6 ptr: 0:235459:112 gen 9
u64s 9 type extent 1612853729:16:U32_MAX len 16 ver 0: durability: 3 crc: c_size 16 size 16 offset 0 nonce 0 csum crc32c compress incompressible ptr: 1:138705:224 gen 11 ptr: 0:119822:527 gen 8 ptr: 2:122974:527 gen 5
u64s 9 type extent 1612853729:32:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:68506:762 gen 8 ptr: 1:621040:560 gen 3 ptr: 2:63335:90 gen 10
u64s 9 type extent 1612853729:48:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:662458:993 gen 4 ptr: 0:665496:263 gen 4 ptr: 1:506611:263 gen 6
u64s 9 type extent 1612853729:80:U32_MAX len 32 ver 0: durability: 3 crc: c_size 32 size 32 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:235459:164 gen 9 ptr: 2:237274:164 gen 6 ptr: 1:238109:288 gen 7
u64s 9 type extent 1612853730:32:U32_MAX len 32 ver 0: durability: 3 crc: c_size 2 size 32 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:238109:393 gen 7 ptr: 2:237274:269 gen 6 ptr: 0:235459:269 gen 9
u64s 9 type extent 1612853731:16:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 32 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:61650:242 gen 5 ptr: 2:27136:242 gen 4 ptr: 1:18438:67 gen 13
u64s 9 type extent 1612853731:32:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:681536:759 gen 1 ptr: 1:681047:179 gen 1 ptr: 0:681473:179 gen 1
u64s 9 type extent 1612853731:48:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:664716:954 gen 5 ptr: 1:664311:921 gen 2 ptr: 2:656796:22 gen 5
u64s 9 type extent 1612853731:64:U32_MAX len 16 ver 0: durability: 3 crc: c_size 16 size 16 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:236208:745 gen 9 ptr: 2:237288:745 gen 6 ptr: 1:238114:869 gen 7
u64s 9 type extent 1612853731:80:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:671134:635 gen 0 ptr: 0:175291:294 gen 13 ptr: 2:184267:293 gen 11
u64s 9 type extent 1612853731:96:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:672154:774 gen 5 ptr: 1:671799:774 gen 4 ptr: 0:672632:103 gen 2
u64s 9 type extent 1612853731:112:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:694763:669 gen 0 ptr: 0:694734:625 gen 1 ptr: 1:536449:263 gen 10
u64s 9 type extent 1612853731:128:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:687298:376 gen 1 ptr: 0:678934:376 gen 4 ptr: 2:679004:197 gen 4
u64s 9 type extent 1612853731:144:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:655228:680 gen 2 ptr: 0:672920:159 gen 3 ptr: 1:672083:544 gen 2
u64s 9 type extent 1612853731:160:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:679597:315 gen 3 ptr: 0:679835:315 gen 4 ptr: 2:680060:290 gen 3
u64s 9 type extent 1612853731:176:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:653211:94 gen 9 ptr: 1:657043:94 gen 6 ptr: 2:630079:93 gen 5
u64s 9 type extent 1612853731:192:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:681466:811 gen 1 ptr: 1:681021:231 gen 1 ptr: 0:681421:231 gen 2
u64s 9 type extent 1612853731:208:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:681584:590 gen 2 ptr: 1:681151:10 gen 2 ptr: 0:681569:10 gen 2
u64s 9 type extent 1612853731:224:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:694754:797 gen 1 ptr: 2:694703:455 gen 2 ptr: 1:694312:436 gen 1
u64s 9 type extent 1612853731:240:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:673727:1010 gen 1 ptr: 0:678090:733 gen 0 ptr: 1:677691:225 gen 0
u64s 9 type extent 1612853731:256:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:665086:883 gen 1 ptr: 0:682040:795 gen 2 ptr: 2:682184:136 gen 1
u64s 9 type extent 1612853731:272:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:677598:760 gen 1 ptr: 2:678111:502 gen 1 ptr: 0:677764:487 gen 2
u64s 9 type extent 1612853731:288:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:679823:990 gen 2 ptr: 1:678466:279 gen 3 ptr: 2:679954:279 gen 1
u64s 9 type extent 1612853731:304:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:678722:598 gen 1 ptr: 1:678492:300 gen 0 ptr: 0:678880:221 gen 0
u64s 9 type extent 1612853731:320:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:561478:100 gen 4 ptr: 2:560677:100 gen 8 ptr: 1:562688:90 gen 13
u64s 9 type extent 1612853731:336:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:681253:629 gen 2 ptr: 1:680774:49 gen 2 ptr: 0:681197:49 gen 2
u64s 9 type extent 1612853731:352:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:681426:676 gen 2 ptr: 0:681878:676 gen 1 ptr: 2:681946:232 gen 1
u64s 9 type extent 1612853731:368:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:694397:1009 gen 1 ptr: 2:694868:391 gen 1 ptr: 0:694852:347 gen 1
u64s 9 type extent 1612853731:384:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:653827:251 gen 1 ptr: 1:653385:251 gen 2 ptr: 2:178325:21 gen 8
u64s 9 type extent 1612853731:400:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:679256:943 gen 0 ptr: 0:672249:879 gen 3 ptr: 1:671994:771 gen 3
u64s 9 type extent 1612853731:416:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:672971:810 gen 2 ptr: 2:672578:307 gen 3 ptr: 1:672525:171 gen 2
u64s 9 type extent 1612853731:432:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:688183:979 gen 1 ptr: 1:504420:205 gen 7 ptr: 2:688231:173 gen 1
u64s 9 type extent 1612853731:448:U32_MAX len 16 ver 0: durability: 3 crc: c_size 16 size 16 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:236209:105 gen 9 ptr: 2:237291:105 gen 9 ptr: 1:238169:229 gen 7
u64s 9 type extent 1612853732:16:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:238169:629 gen 7 ptr: 2:237291:505 gen 9 ptr: 0:236209:505 gen 9
u64s 9 type extent 1612853733:32:U32_MAX len 32 ver 0: durability: 3 crc: c_size 32 size 32 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:240075:723 gen 7 ptr: 2:237792:723 gen 6 ptr: 1:242425:847 gen 5
u64s 9 type extent 1612853733:48:U32_MAX len 16 ver 0: durability: 3 crc: c_size 10 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:491614:977 gen 7 ptr: 2:489932:555 gen 7 ptr: 0:494043:162 gen 3
u64s 9 type extent 1612853733:128:U32_MAX len 80 ver 0: durability: 3 crc: c_size 80 size 80 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:240075:771 gen 7 ptr: 2:237792:771 gen 6 ptr: 1:242425:895 gen 5
# bcachefs list -b extents -s 1612853633:0 /dev/mapper/bch* | head -n50
mounting version 1.3: rebalance_work opts=errors=continue,metadata_replicas=3,data_replicas=3,compression=zstd,foreground_target=ssd,promote_target=ssd,degraded,nochanges,norecovery
recovering from clean shutdown, journal seq 20119310
alloc_read... done
stripes_read... done
snapshots_read... done
u64s 9 type extent 1612853633:32:U32_MAX len 32 ver 0: durability: 3 crc: c_size 32 size 32 offset 0 nonce 0 csum crc32c compress incompressible ptr: 2:183708:19 gen 6 ptr: 0:183662:813 gen 6 ptr: 1:135539:865 gen 5
u64s 9 type extent 1612853633:48:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:147148:504 gen 3 ptr: 1:200251:257 gen 7 ptr: 2:282726:257 gen 1
u64s 9 type extent 1612853633:64:U32_MAX len 16 ver 0: durability: 3 crc: c_size 16 size 16 offset 0 nonce 0 csum crc32c compress incompressible ptr: 2:183708:67 gen 6 ptr: 0:183662:861 gen 6 ptr: 1:135539:913 gen 5
u64s 9 type extent 1612853633:128:U32_MAX len 64 ver 0: durability: 3 crc: c_size 64 size 64 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:246438:827 gen 3 ptr: 1:246063:827 gen 3 ptr: 2:233362:691 gen 5
u64s 9 type extent 1612853633:256:U32_MAX len 128 ver 0: durability: 3 crc: c_size 128 size 128 offset 0 nonce 0 csum crc32c compress incompressible ptr: 2:233362:755 gen 5 ptr: 1:246063:891 gen 3 ptr: 0:246438:891 gen 3
u64s 9 type extent 1612853633:272:U32_MAX len 16 ver 0: durability: 3 crc: c_size 4 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:470194:969 gen 11 ptr: 2:472919:779 gen 8 ptr: 0:174568:17 gen 7
u64s 9 type extent 1612853633:304:U32_MAX len 32 ver 0: durability: 3 crc: c_size 32 size 32 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:137014:361 gen 7 ptr: 1:70020:413 gen 6 ptr: 2:45306:591 gen 12
u64s 9 type extent 1612853633:320:U32_MAX len 16 ver 0: durability: 3 crc: c_size 3 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:147148:505 gen 3 ptr: 1:200251:258 gen 7 ptr: 2:282726:258 gen 1
u64s 9 type extent 1612853645:32:U32_MAX len 32 ver 0: durability: 3 crc: c_size 1 size 32 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:513246:988 gen 13 ptr: 2:512770:548 gen 14 ptr: 1:486048:173 gen 6
u64s 9 type extent 1612853645:48:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:653875:800 gen 3 ptr: 2:654399:94 gen 2 ptr: 0:654242:94 gen 5
u64s 9 type extent 1612853648:32:U32_MAX len 32 ver 0: durability: 3 crc: c_size 1 size 32 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:201121:814 gen 9 ptr: 0:617554:808 gen 6 ptr: 1:200964:741 gen 8
u64s 9 type extent 1612853648:64:U32_MAX len 32 ver 0: durability: 3 crc: c_size 64 size 64 offset 32 nonce 0 csum crc32c compress incompressible ptr: 1:234349:67 gen 7 ptr: 0:241236:67 gen 3 ptr: 2:241157:955 gen 3
u64s 9 type extent 1612853648:80:U32_MAX len 16 ver 0: durability: 3 crc: c_size 3 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:677374:798 gen 4 ptr: 2:677960:738 gen 2 ptr: 0:677846:738 gen 2
u64s 9 type extent 1612853650:31:U32_MAX len 31 ver 0: durability: 3 crc: c_size 1 size 31 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:28293:995 gen 10 ptr: 0:30427:359 gen 12 ptr: 2:29585:359 gen 9
u64s 9 type extent 1612853650:48:U32_MAX len 17 ver 0: durability: 3 crc: c_size 1 size 17 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:28293:996 gen 10 ptr: 0:30427:360 gen 12 ptr: 2:29585:360 gen 9
u64s 9 type extent 1612853652:22:U32_MAX len 22 ver 0: durability: 3 crc: c_size 1 size 22 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:231886:1021 gen 4 ptr: 0:239252:133 gen 4 ptr: 1:238307:133 gen 4
u64s 9 type extent 1612853652:38:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:231886:1022 gen 4 ptr: 0:239252:134 gen 4 ptr: 1:238307:134 gen 4
u64s 9 type extent 1612853652:44:U32_MAX len 6 ver 0: durability: 3 crc: c_size 1 size 6 offset 0 nonce 0 csum crc32c compress zstd ptr: 2:231886:1023 gen 4 ptr: 0:239252:135 gen 4 ptr: 1:238307:135 gen 4
u64s 9 type extent 1612853652:64:U32_MAX len 20 ver 0: durability: 3 crc: c_size 20 size 20 offset 0 nonce 0 csum crc32c compress incompressible ptr: 2:247057:0 gen 3 ptr: 1:238307:136 gen 4 ptr: 0:239252:136 gen 4
u64s 9 type extent 1612853652:80:U32_MAX len 16 ver 0: durability: 3 crc: c_size 2 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:693570:497 gen 0 ptr: 2:681982:497 gen 8 ptr: 0:693973:230 gen 1
u64s 9 type extent 1612853654:16:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:142461:731 gen 10 ptr: 1:141326:499 gen 3 ptr: 2:141599:39 gen 4
u64s 9 type extent 1612853654:32:U32_MAX len 16 ver 0: durability: 3 crc: c_size 1 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 0:141995:1006 gen 6 ptr: 1:141227:774 gen 6 ptr: 2:141593:314 gen 4
u64s 9 type extent 1612853654:128:U32_MAX len 96 ver 0: durability: 3 crc: c_size 128 size 128 offset 32 nonce 0 csum crc32c compress incompressible ptr: 0:239252:172 gen 4 ptr: 1:238307:172 gen 4 ptr: 2:247057:36 gen 3
u64s 9 type extent 1612853654:192:U32_MAX len 64 ver 0: durability: 3 crc: c_size 64 size 64 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:239252:300 gen 4 ptr: 1:238307:300 gen 4 ptr: 2:247057:164 gen 3
u64s 9 type extent 1612853654:256:U32_MAX len 64 ver 0: durability: 3 crc: c_size 64 size 64 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:245190:440 gen 4 ptr: 1:242451:440 gen 5 ptr: 2:246747:304 gen 3
u64s 9 type extent 1612853654:288:U32_MAX len 32 ver 0: durability: 3 crc: c_size 128 size 128 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:245190:504 gen 4 ptr: 1:242451:504 gen 5 ptr: 2:246747:368 gen 3
u64s 9 type extent 1612853654:336:U32_MAX len 48 ver 0: durability: 3 crc: c_size 128 size 128 offset 32 nonce 0 csum crc32c compress incompressible ptr: 0:245190:504 gen 4 ptr: 1:242451:504 gen 5 ptr: 2:246747:368 gen 3
u64s 9 type extent 1612853654:368:U32_MAX len 32 ver 0: durability: 3 crc: c_size 128 size 128 offset 80 nonce 0 csum crc32c compress incompressible ptr: 0:245190:504 gen 4 ptr: 1:242451:504 gen 5 ptr: 2:246747:368 gen 3
u64s 9 type extent 1612853654:384:U32_MAX len 16 ver 0: durability: 3 crc: c_size 128 size 128 offset 112 nonce 0 csum crc32c compress incompressible ptr: 0:245190:504 gen 4 ptr: 1:242451:504 gen 5 ptr: 2:246747:368 gen 3
u64s 9 type extent 1612853654:416:U32_MAX len 32 ver 0: durability: 3 crc: c_size 128 size 128 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:245190:632 gen 4 ptr: 1:242451:632 gen 5 ptr: 2:246747:496 gen 3
u64s 9 type extent 1612853654:464:U32_MAX len 48 ver 0: durability: 3 crc: c_size 128 size 128 offset 32 nonce 0 csum crc32c compress incompressible ptr: 0:245190:632 gen 4 ptr: 1:242451:632 gen 5 ptr: 2:246747:496 gen 3
u64s 9 type extent 1612853654:512:U32_MAX len 48 ver 0: durability: 3 crc: c_size 128 size 128 offset 80 nonce 0 csum crc32c compress incompressible ptr: 0:245190:632 gen 4 ptr: 1:242451:632 gen 5 ptr: 2:246747:496 gen 3
u64s 9 type extent 1612853654:560:U32_MAX len 48 ver 0: durability: 3 crc: c_size 128 size 128 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:245190:760 gen 4 ptr: 1:242451:760 gen 5 ptr: 2:246747:624 gen 3
u64s 9 type extent 1612853654:592:U32_MAX len 32 ver 0: durability: 3 crc: c_size 128 size 128 offset 48 nonce 0 csum crc32c compress incompressible ptr: 0:245190:760 gen 4 ptr: 1:242451:760 gen 5 ptr: 2:246747:624 gen 3
u64s 9 type extent 1612853654:640:U32_MAX len 48 ver 0: durability: 3 crc: c_size 128 size 128 offset 80 nonce 0 csum crc32c compress incompressible ptr: 0:245190:760 gen 4 ptr: 1:242451:760 gen 5 ptr: 2:246747:624 gen 3
u64s 9 type extent 1612853654:688:U32_MAX len 48 ver 0: durability: 3 crc: c_size 64 size 64 offset 0 nonce 0 csum crc32c compress incompressible ptr: 0:245190:888 gen 4 ptr: 1:242451:888 gen 5 ptr: 2:246747:752 gen 3
u64s 9 type extent 1612853654:704:U32_MAX len 16 ver 0: durability: 3 crc: c_size 4 size 16 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:476181:786 gen 10 ptr: 0:513810:241 gen 7 ptr: 2:513353:241 gen 5
u64s 9 type extent 1612853657:48:U32_MAX len 48 ver 0: durability: 3 crc: c_size 48 size 48 offset 0 nonce 0 csum crc32c compress incompressible ptr: 1:652662:236 gen 12 ptr: 2:672904:430 gen 0 ptr: 0:672802:430 gen 0
u64s 9 type extent 1612853657:80:U32_MAX len 32 ver 0: durability: 3 crc: c_size 96 size 96 offset 48 nonce 0 csum crc32c compress incompressible ptr: 0:672802:430 gen 0 ptr: 2:672904:430 gen 0 ptr: 1:652662:236 gen 12
u64s 9 type extent 1612853657:96:U32_MAX len 16 ver 0: durability: 3 crc: c_size 7 size 32 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:517396:580 gen 12 ptr: 0:672791:342 gen 0 ptr: 2:672896:342 gen 0
u64s 9 type extent 1612853657:112:U32_MAX len 16 ver 0: durability: 3 crc: c_size 5 size 32 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:672372:505 gen 0 ptr: 2:672897:505 gen 0 ptr: 0:672796:440 gen 0
u64s 9 type extent 1612853657:128:U32_MAX len 16 ver 0: durability: 3 crc: c_size 9 size 32 offset 0 nonce 0 csum crc32c compress zstd ptr: 1:517396:594 gen 12 ptr: 0:672791:356 gen 0 ptr: 2:672896:356 gen 0
u64s 9 type extent 1612853657:176:U32_MAX len 48 ver 0: durability: 3 crc: c_size 96 size 96 offset 0 nonce 0 csum crc32c compress incompressible ptr: 1:672372:525 gen 0 ptr: 2:672897:525 gen 0 ptr: 0:672796:460 gen 0
u64s 9 type extent 1612853657:192:U32_MAX len 16 ver 0: durability: 3 crc: c_size 96 size 96 offset 48 nonce 0 csum crc32c compress incompressible ptr: 1:672372:525 gen 0 ptr: 2:672897:525 gen 0 ptr: 0:672796:460 gen 0
u64s 9 type extent 1612853657:208:U32_MAX len 16 ver 0: durability: 3 crc: c_size 96 size 96 offset 64 nonce 0 csum crc32c compress incompressible ptr: 1:672372:525 gen 0 ptr: 2:672897:525 gen 0 ptr: 0:672796:460 gen 0

I'll launch two parallel instances of Monero - one on lz4, one with no compression. That should be long-running (and failure-sensitive) enough to rule out false positives.

YellowOnion commented 7 months ago

@koverstreet I was getting corrupted VMs on Windows back with v5.9, circa 2021, I also had that strip ERO bug back in Jan-March related to path level leak or something (memory is vague) and that part of the code hasn't been touched in years, and test xfstests.generic.064 is still failing. I do wonder if there's an elusive bug that's been around for a while.

@chayleaf can you test at some point with debugging enabled, it might catch the bug in the act.

# nixos Config
{}:
{
  boot.kernelPatches = {
    name = "bcachefs_debug";
    structuredExtraConfig = with lib.kernel; {
        BCACHEFS_DEBUG_TRANSACTIONS = yes;
        BCACHEFS_DEBUG = yes;
    };
}
koverstreet commented 7 months ago

Any word on whether it's zstd specific yet?

If it's not zstd specific, we're going to need to characterize the corruption - if you're running Monero, does that give you a way to diff against a good copy? That would be the easiest way of tracking down exactly what is getting corrupted; if we can zero in on the block we can hexdump it (is it all zeroes or something else?) and dig through the journal.

Also relevant is the fact that you're not using tiering; I just fixed a bug in the allocation path that resulted in writes (mostly) silently getting dropped, but that should have only affected configurations using the foreground_target option.

koverstreet commented 7 months ago

And were you using nocow mode?

chayleaf commented 7 months ago

I couldn't reproduce this without zstd for monero (I still haven't synced it on the arm machine with no compression, it's only 80% in, but I did fully sync it on the x86_64 one with lz4). Additionally, there's another corruption issue that could be related: after running yt-dlp -k --keep-fragments --merge-output-format mkv --downloader aria2c https://www.twitch.tv/videos/2011924885 (the command didn't even finish successfully, it just hanged), which downloads a Twitch stream VOD with 17266 8MB chunks to the current directory, ~1-5 chunks disappeared on each ls, with stat saying "No such file or directory" for them, and on each execution the directory kept losing more and more files until there were 15788 left. Also when I tried to delete the directory with -rf, it said "Directory not empty" the first time and only succeeded on the second try. I'll report back tomorrow on whether this is reproducible without zstd.

dmesg.log of the aarch64 machine (this is just the tail of course, i.e. as much as the kernel stores, even the systemd log has reset since boot, or maybe some journal files just got lost)

koverstreet commented 7 months ago

The files going missing is pretty strange - transient corruption makes it sound like a dcache issue, except if they're disappearing from readdir() the dcache shouldn't be involved. If you can reproduce that one on a box where I can ssh to, we can start instrumenting that with tracepoints and see where we get.

I don't think zstd will have anything to do with the twitch streaming bug; although since zstd might be the culprit for the Monero bug we should probably do all testing without zstd until we get to a known good state, then re-enable it.

koverstreet commented 7 months ago

I couldn't repro - I tried your exact yl-dlp command. fsck'd, then deleted, and everything looks find (except that btree node merging doesn't seem to be working on the backpointers btree; write buffer flush doesn't trigger that, whoops).

It sounds more likely to be an issue with your system stability; I'd try running memtest/prime95/etc. and seeing what that turns up - reopen if you find something that points more directly to bcachefs.

chayleaf commented 6 months ago

@koverstreet Obviously, memtest was the very first thing I've tried, and it showed no issues. Also, btrfs didn't have any problems before bcachefs, with the exception of low metadata space issues. Everything, including monero and postgresql, consistently worked on btrfs, monero worked the 2 times I tried to sync it with no compression and lz4 on bcachefs, but it hasn't once consistently worked on two machines with zstd. I've only reproduced the small files issue with zstd too. Just in case, I've tried syncing monero with zstd again after your comment (which is why it took a while for me to reply), and it failed again.

If zstd alone isn't the culprit, it might be caused by the comparatively high load on my machines, maybe you could try zstd:15 on a slower CPU?

I'll try to compile my kernels with bcachefs debug options enabled and report back later.

koverstreet commented 6 months ago

Ok, good to hear on the memtest.

Thanks for narrowing it down to zstd; we'll need to get the zstd devs involved

Conan-Kudo commented 6 months ago

cc: @terrelln

(also FYI: @josefbacik)