kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
527 stars 239 forks source link

Failed to mkfs/btrfstune with both `block-group-tree` and `zoned` though they are said to be supported #765

Closed oxalica closed 4 weeks ago

oxalica commented 3 months ago

In btrfs documentation's "Zoned mode" section, Block group tree is listed as "supported". I read it as "compatible". But currently mkfs.btrfs or btrfstune cannot enable/convert block-group-tree on zoned devices. Is that btrfs-progs does not implement this feature yet, or are there some hidden issues preventing this operation?

My environment:

$ uname -a
Linux invar 6.8.1 #1-NixOS SMP PREEMPT_DYNAMIC Fri Mar 15 18:19:29 UTC 2024 x86_64 GNU/Linux
$ mkfs.btrfs --version
mkfs.btrfs, part of btrfs-progs v6.7.1

To reproduce, first setup nullb emulated block device with 256MiB zones, 10GiB size, 4KiB block size: (this script is copied from https://lwn.net/Articles/836726/)

#!/usr/bin/env bash
set -eo pipefail
sysfs=/sys/kernel/config/nullb/nullb0
if [[ -d $sysfs ]]; then
    echo 0 > "${sysfs}"/power
    rmdir $sysfs
fi
lsmod | grep -q null_blk && rmmod null_blk
modprobe null_blk nr_devices=0
mkdir "${sysfs}"
echo 10240 > "${sysfs}"/size # MiB
echo 1 > "${sysfs}"/zoned
echo 0 > "${sysfs}"/zone_nr_conv
echo 256 > "${sysfs}"/zone_size # MiB
echo 1 > "${sysfs}"/memory_backed
echo 4096 > "${sysfs}"/blocksize
echo 1 > "${sysfs}"/power
udevadm settle

Then sudo mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned will return errors:

# mkfs.btrfs /dev/nullb0 -O block-group-tree,zoned
btrfs-progs v6.7.1
See https://btrfs.readthedocs.io for more information.

Resetting device zones /dev/nullb0 (40 zones) ...
NOTE: several default settings have changed in version 5.15, please make sure
      this does not affect your deployments:
      - DUP for metadata (-m dup)
      - enabled no-holes (-O no-holes)
      - enabled free-space-tree (-R free-space-tree)

ERROR: error during mkfs: Invalid argument

mkfs then btrfstune also fails:

# mkfs.btrfs /dev/nullb0
btrfs-progs v6.7.1
See https://btrfs.readthedocs.io for more information.

Zoned: /dev/nullb0: host-managed device detected, setting zoned feature
Resetting device zones /dev/nullb0 (40 zones) ...
NOTE: several default settings have changed in version 5.15, please make sure
      this does not affect your deployments:
      - DUP for metadata (-m dup)
      - enabled no-holes (-O no-holes)
      - enabled free-space-tree (-R free-space-tree)

Label:              (null)
UUID:               c451698c-6ca0-4d96-ab0e-8d0d9000bc79
Node size:          16384
Sector size:        4096        (CPU page size: 4096)
Filesystem size:    10.00GiB
Block group profiles:
  Data:             single          256.00MiB
  Metadata:         DUP             256.00MiB
  System:           DUP             256.00MiB
SSD detected:       yes
Zoned device:       yes
  Zone size:        256.00MiB
Features:           extref, skinny-metadata, no-holes, free-space-tree, zoned
Checksum:           crc32c
Number of devices:  1
Devices:
   ID        SIZE  ZONES  PATH
    1    10.00GiB     40  /dev/nullb0

# btrfstune /dev/nullb0 --convert-to-block-group-tree
Error reading 1342193664, -1
Error reading 1342193664, -1
ERROR: cannot read chunk root
ERROR: open ctree failed

In either case, dmesg shows nothing except null_blk module loading and device creation.

adam900710 commented 3 months ago

Looks like a bug in btrfs-progs' support for zoned devices.

I'll take a look and fix it soon.

adam900710 commented 3 months ago

For mkfs.btrfs failure to create block group tree, it's a plain pwrite() which is not zoned compatible due to memory alignment. (In fact, btrfs metadata would never be aligned to sector size of the zoned device).

For btrfstune failure, it's related to the open() flags, as we need O_DIRECT to properly imply we're doing zoned operations, so that chunk tree can be properly read using zoned compatible helpers.

Both small fixes, would add test cases for both.

adam900710 commented 4 weeks ago

Closing since it's fixed in v6.8.1 already.