kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
527 stars 239 forks source link

Changing Checksum Fails "running dev-replace detected" #798

Closed aidan-gibson closed 3 weeks ago

aidan-gibson commented 1 month ago
sudo ./btrfstune --csum xxhash64 /dev/sdd
Proceeding to switch checksums
ERROR: running dev-replace detected, please finish or cancel it.
ERROR: btrfstune failed

There is no dev-replace running, although I have replaced devices on this fs before.

Note: This fs is actually two drives using RAID 0

sudo btrfs fi show /dev/sdd
Label: 'Data11' uuid: blah
Total devices 2 FS bytes used 28.46 TiB
devid 1 size 10.91TiB used 10.91TiB path /dev/sdc1
devid 3 size 18.19TiB used 18.17TiB path /dev/sdd

I also tried this a while back, about a year ago I believe, and got the same error. Maybe I botched the dev-replaces in the past? How can I troubleshoot further?

I have already run sudo btrfs check --check-data-csum --progress /dev/sdd without errors (took several days to run).

I am using the latest btrfs-progs, built from git. I have also successfully changed to xxhash64 on my root drives, also RAID1, on the same machine. It really seems like there is something messed up in the metadata (or something) for Data11.

Very willing to do the legwork on this, just tell me what to do. Thanks!

Zygo commented 1 month ago

Replace normally leaves behind an item so it can report things like this:

Started on  8.Dec 02:59:46, finished on 10.Dec 11:38:44, 0 write errs, 0 uncorr. read errs

The check currently checks to see if the item exists:

        key.objectid = 0;
        key.type = BTRFS_DEV_REPLACE_KEY;
        key.offset = 0;
        ret = btrfs_search_slot(NULL, dev_root, &key, &path, 0, 0);
        btrfs_release_path(&path);
        if (ret < 0) {
                errno = -ret;
                error("failed to check the dev-reaplce status: %m");
                return ret;
        }
        if (ret == 0) {
                error("running dev-replace detected, please finish or cancel it.");
                return -EINVAL;
        }

but that's not sufficient. It should also check the contents of the item to see if replace is finished, paused, still running, etc.

[edited to fix typo]

aidan-gibson commented 1 month ago

Thank you so much Zygo! I commented out lines 82-85 in btrfs-progs/tune/change-csum.c https://github.com/kdave/btrfs-progs/blob/5d97c32d6f94cf6f473a5f82964e3edaeb1b146e/tune/change-csum.c#L83 ,rebuilt, and changing checksums is now currently running.

kdave commented 3 weeks ago

Thanks for the report. Fixed in devel.