kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
527 stars 239 forks source link

Subvolume missing after reboot #685

Open paul opened 9 months ago

paul commented 9 months ago

I have /home as a btrfs filesystem, and for years I've had /home/myuser/Downloads and /home/myuser/.cache as a separate subvolumes, mostly so that snapper wouldn't snapshot the files there.

I rebooted yesterday, and noticed that everything in ~/Downloads was gone. Doing btrfs subvolume list /home, the subvolume for Downloads isn't listed at all, but .cache is, as well as all the snapshots, podman containers, etc...

How do I go about figuring out where this subvolume went, and maybe getting it back? I don't care so much about the data on it, but if there's 10s of GB consumed somewhere, I'd like to reclaim it.

I'm running Fedora 38, and from the logs, it looks like btrfs-progs was updated to 6.5.1-1.fc38 before the reboot.

adam900710 commented 9 months ago

Can you mount the fs using subvolid=5, then run btrfs subvolume list -a <mnt>, which would really show all the subvolumes.

By default btrfs subvolume list only list subvolumes from <mnt>, if the two subvolumes are still there but moved to some other location, it may not be accessible from your default mount point.

paul commented 1 week ago

Looks like its happened again, this time I lost both Downloads and .cache.

$ sudo btrfs subvolume list -a /home
ID 673 gen 752812 top level 5 path <FS_TREE>/home
ID 990 gen 752087 top level 673 path home/.snapshots
ID 991 gen 752425 top level 990 path <FS_TREE>/home/.snapshots/1/snapshot
ID 992 gen 752425 top level 990 path <FS_TREE>/home/.snapshots/2/snapshot
ID 3450 gen 752425 top level 990 path <FS_TREE>/home/.snapshots/2126/snapshot
ID 7762 gen 752425 top level 990 path <FS_TREE>/home/.snapshots/5471/snapshot
# a bunch of snapshots
ID 17862 gen 739299 top level 673 path home/rando/.local/share/containers/storage/btrfs/subvolumes/61a348ff61fb1b1ea7dbd1645cc8411434a604946682f7785c26f49cd197c92c
ID 17864 gen 739299 top level 673 path home/rando/.local/share/containers/storage/btrfs/subvolumes/50644c29ef5a27c9a40c393a73ece2479de78325cae7d762ef3cdc19bf42dd0a
# a bunch of containers

There also seems to be quite a bit of space unaccounted for. This is what du reports, and about what I expect (1.2TB-ish)

$ sudo du -sh /home/rando
1.2T    /home/rando

But btrfs itself thinks there's 3.3TB used:

$ sudo btrfs filesystem df /home
Data, RAID0: total=3.38TiB, used=3.37TiB
System, RAID1: total=32.00MiB, used=272.00KiB
Metadata, RAID1: total=32.00GiB, used=30.76GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

$ sudo btrfs filesystem du -s /home
     Total   Exclusive  Set shared  Filename
  54.68TiB     1.15TiB     2.28TiB  /home

$ sudo btrfs filesystem usage /home
Overall:
    Device size:          3.64TiB
    Device allocated:         3.44TiB
    Device unallocated:     201.97GiB
    Device missing:         0.00B
    Device slack:           0.00B
    Used:             3.43TiB
    Free (estimated):       212.20GiB   (min: 111.21GiB)
    Free (statfs, df):      212.20GiB
    Data ratio:              1.00
    Metadata ratio:          2.00
    Global reserve:     512.00MiB   (used: 0.00B)
    Multiple profiles:             no

Data,RAID0: Size:3.38TiB, Used:3.37TiB (99.70%)
   /dev/sda1      1.69TiB
   /dev/sdb1      1.69TiB

Metadata,RAID1: Size:33.00GiB, Used:30.78GiB (93.26%)
   /dev/sda1     33.00GiB
   /dev/sdb1     33.00GiB

System,RAID1: Size:32.00MiB, Used:272.00KiB (0.83%)
   /dev/sda1     32.00MiB
   /dev/sdb1     32.00MiB

Unallocated:
   /dev/sda1    100.98GiB
   /dev/sdb1    100.98GiB

$ sudo btrfs filesystem show /home
Label: 'home'  uuid: c5ab72e4-5ac5-4f49-a558-3ad57add234b
    Total devices 2 FS bytes used 3.40TiB
    devid    1 size 1.82TiB used 1.72TiB path /dev/sda1
    devid    2 size 1.82TiB used 1.72TiB path /dev/sdb1

I'm running the command to see how big each snapshot is, but I doubt it'll make up the difference. I'll update this comment if so.

How can I figure out where my subvolumes went? Is there a command to really list everything? I don't care super-much about the data, but I'd like to reclaim the free space without reformatting the whole thing.

adam900710 commented 1 week ago

For real space usage of each subvolume (aka, how many bytes can be freed after dropping the snapshot/subvolume), the only reliable way (but can be very slow) is using btrfs qgroup show, but you have to enable that.

And for the missing subvolumes, I'm more interested in are they created newly and then a powerloss happened? Or is there some automatic tool periodically removing unused subvolumes/snapshots?

paul commented 1 week ago

I ran it overnight, to see how much each snapshot was taking:

$ find /home/.snapshots '*/snapshot' -maxdepth 1 -mindepth 1 -type d | xargs sudo btrfs fi du -s
     Total   Exclusive  Set shared  Filename
 367.29GiB    37.21MiB   367.24GiB  /home/.snapshots/1
 367.36GiB   116.31MiB   367.23GiB  /home/.snapshots/2
 391.39GiB    16.92GiB   370.00GiB  /home/.snapshots/2126
 416.93GiB     7.89GiB   379.11GiB  /home/.snapshots/5471
 920.83GiB     5.44GiB   879.91GiB  /home/.snapshots/6067
 979.11GiB     5.83GiB   938.20GiB  /home/.snapshots/6604
   1.75TiB     4.65GiB     1.71TiB  /home/.snapshots/7348
   1.62TiB     7.47GiB     1.57TiB  /home/.snapshots/8068
   1.31TiB     7.28GiB     1.26TiB  /home/.snapshots/8808
   1.32TiB     8.37GiB     1.27TiB  /home/.snapshots/9528
   1.11TiB     3.79GiB     1.05TiB  /home/.snapshots/10272
   1.12TiB     3.07GiB     1.06TiB  /home/.snapshots/11016
   1.10TiB     1.39GiB     1.05TiB  /home/.snapshots/11647
   1.11TiB     2.07GiB     1.05TiB  /home/.snapshots/12390
   1.10TiB     1.90GiB     1.05TiB  /home/.snapshots/12612
   1.04TiB   618.82MiB  1020.79GiB  /home/.snapshots/13186
   1.05TiB     5.64GiB  1020.90GiB  /home/.snapshots/13210
   1.04TiB   272.26MiB  1020.92GiB  /home/.snapshots/13234
   1.04TiB   203.78MiB  1020.77GiB  /home/.snapshots/13258
   1.04TiB    64.15MiB  1020.77GiB  /home/.snapshots/13282
   1.04TiB    63.47MiB  1020.81GiB  /home/.snapshots/13306
   1.04TiB    51.74MiB  1020.93GiB  /home/.snapshots/13316
   1.04TiB    44.36MiB  1020.99GiB  /home/.snapshots/13317
   1.04TiB    47.31MiB  1021.01GiB  /home/.snapshots/13318
   1.04TiB    50.43MiB  1021.07GiB  /home/.snapshots/13319
   1.04TiB    53.06MiB  1021.07GiB  /home/.snapshots/13320
   1.04TiB    55.39MiB  1021.10GiB  /home/.snapshots/13321
   1.04TiB    48.61MiB  1021.13GiB  /home/.snapshots/13322
   1.04TiB    95.82MiB  1021.11GiB  /home/.snapshots/13323
   1.04TiB    35.69MiB  1021.14GiB  /home/.snapshots/13324
   1.04TiB    33.62MiB  1021.16GiB  /home/.snapshots/13325
   1.04TiB    34.30MiB  1021.16GiB  /home/.snapshots/13326
   1.04TiB    34.75MiB  1021.17GiB  /home/.snapshots/13327
   1.04TiB    33.61MiB  1021.17GiB  /home/.snapshots/13328
   1.04TiB    32.54MiB  1021.19GiB  /home/.snapshots/13329
   1.04TiB    34.88MiB  1021.19GiB  /home/.snapshots/13330
   1.04TiB    34.16MiB  1021.20GiB  /home/.snapshots/13331
   1.04TiB    33.12MiB  1021.20GiB  /home/.snapshots/13332
   1.04TiB    32.55MiB  1021.24GiB  /home/.snapshots/13333
   1.04TiB    33.60MiB  1021.24GiB  /home/.snapshots/13334
   1.04TiB    34.28MiB  1021.25GiB  /home/.snapshots/13335
   1.04TiB    35.05MiB  1021.25GiB  /home/.snapshots/13336
   1.04TiB    34.09MiB  1021.27GiB  /home/.snapshots/13337
   1.04TiB    35.18MiB  1021.27GiB  /home/.snapshots/13338
   1.05TiB    47.32MiB  1021.96GiB  /home/.snapshots/13339
   1.05TiB    44.21MiB  1022.11GiB  /home/.snapshots/13340
   1.04TiB    51.91MiB  1021.51GiB  /home/.snapshots/13341
   1.04TiB    47.88MiB  1021.52GiB  /home/.snapshots/13342
   1.04TiB    53.32MiB  1021.65GiB  /home/.snapshots/13343
   1.04TiB    50.69MiB  1021.72GiB  /home/.snapshots/13344
   1.05TiB    57.59MiB  1021.79GiB  /home/.snapshots/13345
   1.04TiB   126.05MiB  1016.64GiB  /home/.snapshots/13346

The sum of "Exclusive" is about 80GiB.

@adam900710 The missing subvols were created long ago, when I initially partitioned the drive. I setup my /home vol, then subvolumes for Downloads and .cache.

Then, when I first opened this issue, I'd done a dnf update (which appears to have updated btrfsprogs) and rebooted, and noticed my Downloads partition missing. I reported the issue here, created a new subvol, and kinda forgot about it.

This time, I was doing some cleanup, accidentally deleted a bunch of stuff I shouldn't have from my home dir, restored the files from a backup, and when I rebooted the subvols were gone. Its possible I deleted the .cache and Downloads folders/mountpoints, but I wouldn't expect that to orphan the subvolumes.

I does look like something is consuming that space.

btrfs usage thinks I'm using 3.44TiB of my 3.64TiB. du and usage both agree /home/rando is using ~1.2TiB. The snapshots account for another 80GiB, so there's apparently 2 TiB "used" somewhere that I can't find. I know very little about the inner workings of this, but my gut is telling my the Downloads subvol from last year, plus Downloads and cache from this time, might account for a huge portion of that (I'm not real good about cleaning out my downloads folder).

adam900710 commented 1 week ago

Firstly, btrfs fi du is not that accurate. fi du is only file extent level checking, meanwhile the used space is meta/data extent level. The difference is, file extent can only refer to all or part of a data extent, aka, the btrfs bookend behavior.

So please use qgroup instead to give a proper view on extent level usage.

The other thing is, since 4.18 kernel, rmdir syscall can delete an empty subvolume, meaning you can accidentally remove the subvolume just by rm -rf.

paul commented 1 week ago

Ok, here it is after enabling quotas:

$ sudo btrfs qgroup show /home
Qgroupid    Referenced    Exclusive   Path
--------    ----------    ---------   ----
0/5           16.00KiB     16.00KiB   <toplevel>
0/673          1.02TiB      1.02TiB   home
0/990         19.30MiB     19.30MiB   home/.snapshots
0/991        381.74GiB     92.04MiB   home/.snapshots/1/snapshot
0/992        381.82GiB    173.29MiB   home/.snapshots/2/snapshot
0/3450       394.77GiB     13.38GiB   home/.snapshots/2126/snapshot
0/7762       399.07GiB      6.62GiB   home/.snapshots/5471/snapshot
0/8427       898.96GiB      5.56GiB   home/.snapshots/6067/snapshot
0/8964       952.84GiB      4.14GiB   home/.snapshots/6604/snapshot
0/10254        1.72TiB      3.64GiB   home/.snapshots/7348/snapshot
0/11047        1.59TiB      5.27GiB   home/.snapshots/8068/snapshot
0/12975        1.28TiB     10.75GiB   home/.snapshots/8808/snapshot
0/13695        1.28TiB      7.95GiB   home/.snapshots/9528/snapshot
0/14457        1.06TiB      3.63GiB   home/.snapshots/10272/snapshot
0/15212        1.07TiB      2.85GiB   home/.snapshots/11016/snapshot
0/15854        1.06TiB    974.27MiB   home/.snapshots/11647/snapshot
0/16597        1.06TiB      1.27GiB   home/.snapshots/12390/snapshot
0/16819        1.06TiB      1.84GiB   home/.snapshots/12612/snapshot
0/17393     1021.65GiB    586.93MiB   home/.snapshots/13186/snapshot
0/17417        1.00TiB      5.67GiB   home/.snapshots/13210/snapshot
0/17441     1021.68GiB    367.48MiB   home/.snapshots/13234/snapshot
0/17465     1021.51GiB    308.68MiB   home/.snapshots/13258/snapshot
# All the snapshots after this have ~1GiB referenced and a few hundred MiB exclusive

Referenced and Exclusive add up to 54TiB and 1,123GiB, respectively. So that 1.123 TiB matches up with the 1.2TiB used I'm seeing elsewhere. Still doesn't explain where the other 2 TiB went, though.

The other thing is, since 4.18 kernel, rmdir syscall can delete an empty subvolume, meaning you can accidentally remove the subvolume just by rm -rf

Ah, that could have been what happened, I suppose. But even if the subvolume(s) got deleted that way, it seems like they're still consuming disk somehow. Having 200MB free out of 4TB of storage makes me nervous, and I can't figure out how to free it up.

adam900710 commented 1 week ago

Then the problem really seems to be some subvolume is still only orphaned but not yet full deleted (the deletion is happening at the background).

You can check that by either btrfs subvolume list -d to show such subvolumes, or btrfs ins dump-tree -t root <device> and looking for the key value ORPHAN.

Anyway, you can make btrfs to do the cleanup more frequently by remount with commit=5 so that cleanup thread would be waken every 5 secs. Then btrfs subvolume sync to wait for all dropped subvolumes to be cleanedup.

Finally, if there is orphan subvolumes being dropped, it's strongly recommended to disable quota or it can cause performance problems.

paul commented 1 week ago

It doesn't seem to be orphaned, or in the process of deleting...

btrfs subvolume list -d doesn't return anything, and btrfs ins dump-tree -t root /dev/sda1 | grep -i orphan doesn't either, for both devices. btrfs subvolume sync /home returns immediately, no waiting.

Its been over a week since I would have accidentally deleted the subvolumes, I'd have expected it to have been cleaned up by now, surely?