koverstreet / bcachefs-tools

http://bcachefs.org
GNU General Public License v2.0
120 stars 89 forks source link

bcachefs device add seems to scale default bucket size by total volume size #212

Open kode54 opened 8 months ago

kode54 commented 8 months ago

I have created two qcow2 volumes roughly sized according to my two large hard drives:

qemu-img create -f qcow2 bcache-1.qcow2 18000G
qemu-img create -f qcow2 bcache-2.qcow2 18000G

and mounted them with qemu-nbd:

sudo modprobe nbd
sudo qemu-nbd -c /dev/nbd0 bcache-1.qcow2
sudo qemu-nbd -c /dev/nbd1 bcache-2.qcow2

If I format them together:

sudo bcachefs format /dev/nbd0 /dev/nbd1

Then they both end up with 512 KiB bucket size, like my first formatted drive:

/dev/nbd0 contains a bcachefs filesystem
Proceed anyway? (y,n) y
/dev/nbd1 contains a bcachefs filesystem
Proceed anyway? (y,n) y
External UUID:                              da5b2aa7-0c5c-4aa7-a4c2-240f43d87932
Internal UUID:                              64cda1f8-319d-4682-a608-3a9a7928c729
Magic number:                               c68573f6-66ce-90a9-d96a-60cf803df7ef
Device index:                               1
Label:
Version:                                    1.4: member_seq
Version upgrade complete:                   0.0: (unknown version)
Oldest version on disk:                     1.4: member_seq
Created:                                    Mon Jan 15 12:57:22 2024
Sequence number:                            0
Time of last write:                         Wed Dec 31 16:00:00 1969
Superblock size:                            1144
Clean:                                      0
Devices:                                    2
Sections:                                   members_v1,members_v2
Features:                                   new_siphash,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features:

Options:
  block_size:                               512 B
  btree_node_size:                          256 KiB
  errors:                                   continue [ro] panic
  metadata_replicas:                        1
  data_replicas:                            1
  metadata_replicas_required:               1
  data_replicas_required:                   1
  encoded_extent_max:                       64.0 KiB
  metadata_checksum:                        none [crc32c] crc64 xxhash
  data_checksum:                            none [crc32c] crc64 xxhash
  compression:                              none
  background_compression:                   none
  str_hash:                                 crc32c crc64 [siphash]
  metadata_target:                          none
  foreground_target:                        none
  background_target:                        none
  promote_target:                           none
  erasure_code:                             0
  inodes_32bit:                             1
  shard_inode_numbers:                      1
  inodes_use_key_cache:                     1
  gc_reserve_percent:                       8
  gc_reserve_bytes:                         0 B
  root_reserve_percent:                     0
  wide_macs:                                0
  acl:                                      1
  usrquota:                                 0
  grpquota:                                 0
  prjquota:                                 0
  journal_flush_delay:                      1000
  journal_flush_disabled:                   0
  journal_reclaim_delay:                    100
  journal_transaction_names:                1
  version_upgrade:                          [compatible] incompatible none
  nocow:                                    0

members_v2 (size 272):
Device:                                     0
  Label:                                    (none)
  UUID:                                     9cc991d7-2462-4457-b0f8-bcca9d841e64
  Size:                                     17.6 TiB
  read errors:                              0
  write errors:                             0
  checksum errors:                          0
  seqread iops:                             0
  seqwrite iops:                            0
  randread iops:                            0
  randwrite iops:                           0
  Bucket size:                              512 KiB
  First bucket:                             0
  Buckets:                                  36864000
  Last mount:                               (never)
  Last superblock write:                    0
  State:                                    rw
  Data allowed:                             journal,btree,user
  Has data:                                 (none)
  Durability:                               1
  Discard:                                  0
  Freespace initialized:                    0
Device:                                     1
  Label:                                    (none)
  UUID:                                     ccac6898-821d-4597-ab7d-7895fcebd5c0
  Size:                                     17.6 TiB
  read errors:                              0
  write errors:                             0
  checksum errors:                          0
  seqread iops:                             0
  seqwrite iops:                            0
  randread iops:                            0
  randwrite iops:                           0
  Bucket size:                              512 KiB
  First bucket:                             0
  Buckets:                                  36864000
  Last mount:                               (never)
  Last superblock write:                    0
  State:                                    rw
  Data allowed:                             journal,btree,user
  Has data:                                 (none)
  Durability:                               1
  Discard:                                  0
  Freespace initialized:                    0
mounting version 1.4: member_seq
initializing new filesystem
going read-write
initializing freespace

If I format one, then add the second device (later) using device add, then the second drive ends up with 1024 KiB bucket size:

sudo bcachefs format /dev/nbd0
sudo mkdir /tmp/bcache
sudo mount /dev/nbd0 /tmp/bcache
sudo bcachefs device add /tmp/bcache /dev/nbd1
> sudo bcachefs fs usage /tmp/bcache
Filesystem: efdc2be6-9027-4a7c-8847-a71701f0f69c
Size:                 35562329210880
Used:                    12897222656
Online reserved:                   0

Data type       Required/total  Durability    Devices
btree:          1/1             1             [nbd0]               4456448

(no label) (device 0):          nbd0              rw
                                data         buckets    fragmented
  free:               19323047903232        36855789
  sb:                        3149824               7        520192
  journal:                4294967296            8192
  btree:                     4456448              12       1835008
  user:                            0               0
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  capacity:           19327352832000        36864000

(no label) (device 1):          nbd1              rw
                                data         buckets    fragmented
  free:               19318758703104        18423804
  sb:                        3149824               4       1044480
  journal:                8589934592            8192
  btree:                           0               0
  user:                            0               0
  cached:                          0               0
  parity:                          0               0
  stripe:                          0               0
  need_gc_gens:                    0               0
  need_discard:                    0               0
  capacity:           19327352832000        18432000
kode54 commented 1 month ago

This still appears to be an issue.

kode54 commented 1 week ago

Okay, I figured it out:

https://github.com/koverstreet/bcachefs-tools/blob/68704c30dce693b83deb0e7ea40d47bae8e359b4/c_src/libbcachefs.c#L69-L103

The problem here is not the calculation algorithm. The problem is that when the volumes are created, the btree_node_size option isn't populated, so the bucket size starts as 4096 (the block size of the volume), then gets increased to the min size of 128KiB, then from the device size of 18'000'000'000'000 ends up producing the ilog2 of 27, divided down to 6, round down power of two of 4, and scaling 131'072 by 4 to result in 512KiB.

The added drive, on the other hand, populates the btree_node_size option with 256KiB. The division and ilog2 still yields 6, rounded down to 4, but ends up scaling up 262'144, which results in 1024KiB.