kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
561 stars 244 forks source link

Unable to set seeding status on host-managed smr drive #663

Open xyunsw opened 1 year ago

xyunsw commented 1 year ago

Dear all,

(linux 6.5.0, btrfs-progs v6.3.3) I was trying to set seeding status on a host-managed smr drive (HGST HSH721414ALE6M4), which was encrypted by dm-crypt.

root@nas:~# btrfstune -S 1 -f /dev/mapper/diskp-old
Error reading 39131861975040, -1
Error reading 39131861975040, -1
ERROR: cannot read chunk root
ERROR: open ctree failed

But btrfs check didn't report any error.

lsblk shows:

root@nas:~# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
......
sde           8:64   0  12.7T  0 disk
└─diskp-old 253:0    0  12.7T  0 crypt
......

Looks like something went wrong when reading the drive. So I strace btrfstune to see what was happening.

root@nas:~# strace btrfstune -S 1 -f /dev/mapper/diskp-old
execve("/usr/local/bin/btrfstune", ["btrfstune", "-S", "1", "-f", "/dev/mapper/diskp-old"], 0x7ffe76be2e10 /* 19 vars */) = 0
......
openat(AT_FDCWD, "/sys/block/dm-0/queue/zoned", O_RDONLY) = 4
read(4, "host-managed\n", 32)           = 13
close(4)                                = 0
openat(AT_FDCWD, "/dev/mapper/diskp-old", O_RDWR|O_DIRECT) = 4
fadvise64(4, 0, 0, POSIX_FADV_DONTNEED) = 0
newfstatat(3, "", {st_mode=S_IFBLK|0660, st_rdev=makedev(0xfd, 0), ...}, AT_EMPTY_PATH) = 0
ioctl(3, BLKGETZONESZ, [524288])        = 0
ioctl(3, BLKREPORTZONE, 0x1bcf720)      = 0
newfstatat(3, "", {st_mode=S_IFBLK|0660, st_rdev=makedev(0xfd, 0), ...}, AT_EMPTY_PATH) = 0
ioctl(3, BLKSSZGET, [512])              = 0
pread64(3, "\266\314\1\376z\25\257\255\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096
fcntl(3, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
pread64(4, 0x1bd1ad0, 16384, 13955032121344) = -1 EINVAL (Invalid argument)
write(2, "Error reading 39131861975040, -1"..., 33Error reading 39131861975040, -1
) = 33
pread64(4, 0x1bd1ad0, 16384, 13955300556800) = -1 EINVAL (Invalid argument)
write(2, "Error reading 39131861975040, -1"..., 33Error reading 39131861975040, -1
) = 33
write(2, "ERROR: ", 7ERROR: )                  = 7
write(2, "cannot read chunk root", 22cannot read chunk root)  = 22
write(2, "\n", 1
)                       = 1
......

My drive was opened with direct IO and the two pread failed with EINVAL. It is possible that the address or offset isn't aligned properly. In this example, the buffer 0x1bd1ad0 passed to pread may not aligned properly because 0x1bd1ad0 % 4096 = 2768, 0x1bd1ad0 % 512 = 208. I wrote a simple program to pread the size and offset indicated above, with aligned (aligned_alloc to 16384) and unaligned (malloc) buffers. The aligned buffer can be pread successfully but the unaligned buffer fails with EINVAL. I believe this issue only relates to host-managed smr drives because I tested btrfstune -S 1 on normal drives and it works properly. I also checked the code in tune/main.c and disk-io.c but still not sure how this happened and how to fix. Any idea on this issue?

kdave commented 1 year ago

I think seeding+zoned hasn't been tested as a use case, it might work if the super block update is done in the zoned friendly way.

kdave commented 1 year ago

The commands from btrfstune do some changes directly to the superblock which might not be using the log-style write and violate the sequential writing constraint. We need to do full coverage of all zoned + btrfstune features too.