kdave / btrfs-progs

Development of userspace BTRFS tools
GNU General Public License v2.0
550 stars 241 forks source link

btrfs receive "No space left on device" on RAID1 with 85% space free #657

Open douzzer opened 1 year ago

douzzer commented 1 year ago

(using kernel 6.4.3, btrfs-progs 6.3.3)

I'm hitting an early ENOSPC on a freshly created RAID1, apparently because metadata isn't growing as needed. It's happened twice in a row with different media, most recently with Lexar ships-from-sold-by-Amazon, and no kernel messages (no I/O errors), so I'm not suspecting counterfeit media. The same procedure worked consistently for years before, most recently on kernel 6.0.6, btrfs-progs 6.3.1. I haven't yet gotten it to work on kernel 6.4.

The new array looked like this right after crashing out of the send/receive pipeline:

Data, RAID1: total=475.93GiB, used=69.49GiB
System, RAID1: total=8.00MiB, used=96.00KiB
Metadata, RAID1: total=1.00GiB, used=1011.77MiB
GlobalReserve, single: total=132.72MiB, used=0.00B

Looks like metadata space exhausted.

An existing and healthy array made long ago with the same procedure looks like this:

Data, RAID1: total=450.85GiB, used=357.52GiB
System, RAID1: total=32.00MiB, used=96.00KiB
Metadata, RAID1: total=26.01GiB, used=7.71GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Obviously a much larger metadata block group.

My understanding is that the metadata block group is supposed to automatically enlarge as needed, as long as space is available on the partition. And there appears to be no way to manually preconfigure a larger metadata block group allocation at creation time or through a runtime command.

I used substantially identical mkfs.btrfs and mount options for the old and new arrays. For the new one:

mkfs.btrfs -f -L douz_20230802 -m raid1 -d raid1 /dev/mapper/douz_20230802_0 /dev/mapper/douz_20230802_1
mount -t btrfs -o ssd,ssd_spread,discard,relatime LABEL=douz_20230802 /mnt/douz_20230802

I have the creation and mounting operations automated, so I know with precision that what worked on kernel 6.0.6 progs 6.3.1 isn't working on kernel 6.4.3 progs 6.3.3, modulo different media (worked on Samsung and Sandisk, failing on Microcenter and Lexar but with no I/O errors logged).

When I tried several btrfs balance ops, then unmounted and remounted the filesystem, it went from showing df output

Filesystem                      1K-blocks       Used  Available Use% Mounted on
/dev/mapper/douz_20230802_1     500104172   73908132  426179980  15% /mnt/douz_20230802

to

/dev/mapper/douz_20230802_1 500104172 74038404         0 100% /mnt/douz_20230802

And indeed it was impossible to create even a single empty file on the filesystem.

I'm happy to run experiments here to help troubleshooting.

douzzer commented 1 year ago

I've just tried a tar pipeline, and the metadata block group is growing correctly. At 951.50MiB, it grew from 1GB to 2GB.

(update: finished successfully with these allocations:

Data, RAID1: total=269.00GiB, used=267.24GiB
System, RAID1: total=32.00MiB, used=64.00KiB
Metadata, RAID1: total=4.00GiB, used=3.55GiB
GlobalReserve, single: total=470.30MiB, used=0.00B

)

So this problem seems to be specific to btrfs receive, though of course it's unclear whether the breaking change was in btrfs-progs or the kernel.

douzzer commented 1 year ago

Short update: After the tar succeeded, I deleted that tree, and reattempted the btrfs send. That actually succeeded, to a degree. The Metadata block group contracted when I removed the tar tree, and expanded as the btrfs receive progressed. All seemed well.

However, I've now returned to this RAID1 media pair to do an incremental snapshot, and there are similar symptoms, but the syndrome is even more confusing. Specifically, all balance and file creation attempts (even empty files) fail with ENOSPC, but the filesystem info gives no indication why, except for btrfs files show reporting 100% usage, in conflict with all other reporting tools.

# btrfs files df /mnt/douz_20230802/
Data, RAID1: total=472.90GiB, used=264.62GiB
System, RAID1: total=32.00MiB, used=96.00KiB
Metadata, RAID1: total=4.00GiB, used=3.54GiB
GlobalReserve, single: total=481.77MiB, used=0.00B
# btrfs balance start -dusage=0 -musage=0 /mnt/douz_20230802
Done, had to relocate 0 out of 480 chunks
# btrfs balance start -dusage=5 -musage=5 /mnt/douz_20230802
ERROR: error during balancing '/mnt/douz_20230802': No space left on device
# df /mnt/douz_20230802/
Filesystem                  1K-blocks      Used Available Use% Mounted on
/dev/mapper/douz_20230802_0 500104172 281682780 218396068  57% /mnt/douz_20230802
# btrfs files show /mnt/douz_20230802/
Label: 'douz_20230802'  uuid: 3112349d-3cf0-4ccb-b3d3-6fbb1a872a4a
        Total devices 2 FS bytes used 268.16GiB
        devid    1 size 476.94GiB used 476.93GiB path /dev/mapper/douz_20230802_0
        devid    2 size 476.94GiB used 476.93GiB path /dev/mapper/douz_20230802_1
# btrfs files usage /mnt/douz_20230802/
Overall:
    Device size:                 953.87GiB
    Device allocated:            953.87GiB
    Device unallocated:            2.09MiB
    Device missing:                  0.00B
    Device slack:                  6.00KiB
    Used:                        536.33GiB
    Free (estimated):            208.28GiB      (min: 208.28GiB)
    Free (statfs, df):           208.28GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              481.77MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,RAID1: Size:472.90GiB, Used:264.62GiB (55.96%)
   /dev/mapper/douz_20230802_0   472.90GiB
   /dev/mapper/douz_20230802_1   472.90GiB

Metadata,RAID1: Size:4.00GiB, Used:3.54GiB (88.44%)
   /dev/mapper/douz_20230802_0     4.00GiB
   /dev/mapper/douz_20230802_1     4.00GiB

System,RAID1: Size:32.00MiB, Used:96.00KiB (0.29%)
   /dev/mapper/douz_20230802_0    32.00MiB
   /dev/mapper/douz_20230802_1    32.00MiB

Unallocated:
   /dev/mapper/douz_20230802_0     1.04MiB
   /dev/mapper/douz_20230802_1     1.04MiB
adam900710 commented 1 year ago

Under all cases, your metadata is full.

For your initial report:

Metadata, RAID1: total=1.00GiB, used=1011.77MiB
GlobalReserve, single: total=132.72MiB, used=0.00B

This means your metadata has only 13MiB space left, and global reserve requires 132MiB, which must be from metadata space. (Thus your metadata is already over-allocated).

For your latest report:

Metadata, RAID1: total=4.00GiB, used=3.54GiB
GlobalReserve, single: total=481.77MiB, used=0.00B

Your metadata is still full, the remaining 512MiB would be mostly reserved for global rsv, thus balance and a lot of operations won't be feasible.

In short, please keep an eye on the unallocated space. You would want to keep the unallocated space in the GiB level. (That 1MiB is always reserved, thus it won't be utilized anyway)

douzzer commented 1 year ago

@adam900710:

please keep an eye on the unallocated space.

What do you mean? The medium is 43% empty. What mitigating action is the user supposed to take?

When the syndrome first presented, balancing operations were actually completing with reported success, but did not actually resolve the issue.

Again, what mitigating action is the user supposed to take?

It sounds like you are suggesting that it is not a supported BTRFS use case to freshly initialize a RAID1 filesystem with current default mkfs.btrfs settings, and run a send | receive to it with data comprising only 57% of the medium capacity.

douzzer commented 1 year ago

It occurs to me, at least as a mitigation, that I should try --mixed for this use case, even though the medium is large (512GB). With flash media in the 100-200MB/s class, being used in an archival/incremental backup role, are there significant downsides to doing that?

Zygo commented 1 year ago

My question is whether this is due to something btrfs receive does, or some larger bug that affects 6.4 generally, or is there some other activity occurring on this filesystem that has not yet been reported. It would be useful to repeat this experiment while capturing the output of btrfs fi usage -T as it changes over time, particularly to see the difference in size between allocated and used data space.

In the reports, it looks like btrfs receive is somehow allocating about twice as much space for data as it is using, which means the filesystem will allocate all of its space to data too quickly. At the end, there's no unallocated space that could be allocated for more metadata. Since half the allocated data space is unused, it's still possible to add more data to the filesystem until the allocated metadata runs out, and then the filesystem gets into a bad state that is difficult to recover.

Certainly the filesystem has now reached the point of no return on block group allocations; however, if the history described is complete, then none of that should be happening until the filesystem is over 98% full of data, and several additional conditions are met. This filesystem's data usage is below 60%, the other 40% is data allocations that are not currently in use.

On the other hand, there's not enough data reported so far to distinguish between btrfs receive failing due to "lack of space at 57% because of a kernel bug", or failing due to "lack of space at 99% because there was no data space left, then deleting a partially completed file or subvol which accounts for 42% of the space during error handling, so the final usage reported on github is 57%". This is why it's important to observe usage over time, to see what the peak values are. If btrfs receive is really hitting 100% data fill, then there's not much you can do except increase metadata_ratio on mount, and run data reclaim immediately after btrfs receive fails (or make the filesystem large enough for the entire received subvol, if mere lack of space is the issue).

I could recommend various best practices on running maintenance balances on data block groups or setting up automatic reclaim, but it wouldn't help if the kernel is allocating data block groups at excessively high rates that overwhelm the maintenance IO capacity and metadata ratio--that's simply a bug (a regression if 6.3 and earlier kernels are OK), and it can't be worked around--it needs to be fixed.

Now that the filesystem is in this state, it is difficult to recover. All the space has been allocated, and all recovery methods require proactive reclaim of mostly-but-not-completely-empty data block groups while there is still some unallocated space. Adding a pair of devices might fail due to lack of space to update the metadata. You might be able to delete a file, run sync, repeat, until you eventually remove enough files to delete a data block group and unallocated space goes above 1 MiB, but that's a lot of trial-and-error work.


Normal data reclaim maintenance (probably would not work if there's a kernel bug):

# btrfs-progs, run from crontab/systemd timer or after a large delete
btrfs balance start -dusage=75 /mnt/douz_20230802

# python-btrfs, run from crontab/systemd timer or after a large delete
btrfs-balance-least-used -u 75 /mnt/douz_20230802

# newer kernels can run reclaim automatically
echo 75 | tee /sys/fs/btrfs/*/allocation/data/bg_reclaim_threshold
echo 0 | tee /sys/fs/btrfs/*/allocation/metadata/bg_reclaim_threshold

Note that having the kernel doing reclaim means you can't schedule when the reclaim occurs, so you may find it preferable to use a cron job or systemd timer running one of the other commands at a scheduled time.


A hack to prepare for faster recovery should this occur again:

btrfs fi resize <devid>:-10G /mnt/douz_20230802

which will leave 10 GiB of slack at the end of the device (repeat for all devices). If you run out of space again, simply run

btrfs fi resize <devid>:max /mnt/douz_20230802

which will provide 10 GiB of unallocated space (also repeat for all devices). After that, immediately start a data balance until there is more than 10 GiB unallocated on all devices, then resize all devices back to -10 GiB so you have the space available for the next time (note that if you regularly run data reclaim and never balance metadata, there won't be a next time).


Incidentally:

# btrfs balance start -dusage=5 -musage=5 /mnt/douz_20230802

Never balance metadata, because it will cause or worsen exactly this problem by deleting previously established metadata allocations, reducing the total amount of available space for metadata. The only exception is when converting to a different raid profile, or during some complex filesystem array reshapes, and even then, you have to avoid doing it on very full filesystems. You're not doing anything exceptional here.

douzzer commented 1 year ago

Hi @Zygo. Thanks for the lengthy and thoughtful post.

My question is whether this is due to something btrfs receive does, or some larger bug that affects 6.4 generally, or is there some other activity occurring on this filesystem that has not yet been reported.

My question too. In any case, any other activity will have been self-initiated activity by the kernel, because the target RAID1 device is used exclusively to receive the volume.

In short, the sequence here was mkfs.btrfs, ... | btrfs receive, ENOSPC, with no user-initiated actions or system-reported events in between.

Notice that it is only btrfs receive that jacks up the target filesystem. A tar pipeline didn't break it, showing a properly growing metadata block group, and leaving the filesystem in a usable state.

In the reports, it looks like btrfs receive is somehow allocating about twice as much space for data as it is using

How exactly did you determine that? I.e. which field(s) in the info reports tell you that? That certainly sounds smoking-gun-ish.

Obtw, the data on this RAID1 is not valuable. I've just been rolling over to two other RAID1 media pairs that are, as yet, still healthy. Their metadata allocations are very different from this (btrfs files usage below).

All of three of the arrays are showing used==size in the show output. First the broken RAID1, then the two healthy ones:

# btrfs files show /mnt/douz_20230802/
Label: 'douz_20230802'  uuid: 3112349d-3cf0-4ccb-b3d3-6fbb1a872a4a
        Total devices 2 FS bytes used 268.16GiB
        devid    1 size 476.94GiB used 476.93GiB path /dev/mapper/douz_20230802_0
        devid    2 size 476.94GiB used 476.93GiB path /dev/mapper/douz_20230802_1
# btrfs files show /mnt/douz_20210303
Label: 'douz_20210303'  uuid: c2087628-8292-437c-92db-1ad08045dcf0
        Total devices 2 FS bytes used 402.30GiB
        devid    1 size 476.89GiB used 476.89GiB path /dev/mapper/douz_20210303_0
        devid    2 size 476.89GiB used 476.89GiB path /dev/mapper/douz_20210303_1
# btrfs files show /mnt/douz_20210306
Label: 'douz_20210306'  uuid: c33d05bd-d1c9-4943-8cf4-ca56d201e1d0
        Total devices 2 FS bytes used 426.40GiB
        devid    1 size 476.89GiB used 476.89GiB path /dev/mapper/douz_20210306_0
        devid    2 size 476.89GiB used 476.89GiB path /dev/mapper/douz_20210306_1

This is in marked contrast to a pair of non-removable spinning RAID1s:

# btrfs files show /u3
Label: 'douz_20220728'  uuid: 96949e72-9933-4bd8-af0c-63ff80d6f332
        Total devices 2 FS bytes used 1.08TiB
        devid    1 size 4.55TiB used 1.08TiB path /dev/mapper/douzzer_20220728_2
        devid    2 size 4.55TiB used 1.08TiB path /dev/mapper/douzzer_20220728_1
# btrfs files show /backup
Label: 'douz_20230325'  uuid: cd2094db-45b9-4876-92c3-1fc7cf3f05fb
        Total devices 2 FS bytes used 1.65TiB
        devid    1 size 4.55TiB used 1.68TiB path /dev/mapper/douz_20230325_1
        devid    2 size 4.55TiB used 1.68TiB path /dev/mapper/douz_20230325_2

I don't know what to make of that disparity.

Here's usage on the two healthy flash arrays, both created a couple years ago:

# btrfs files usage /mnt/douz_20210303
Overall:
    Device size:                 953.78GiB
    Device allocated:            953.78GiB
    Device unallocated:            2.09MiB
    Device missing:                  0.00B
    Device slack:                608.01MiB
    Used:                        804.59GiB
    Free (estimated):             57.72GiB      (min: 57.72GiB)
    Free (statfs, df):            57.72GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,RAID1: Size:450.86GiB, Used:393.14GiB (87.20%)
   /dev/mapper/douz_20210303_0   450.86GiB
   /dev/mapper/douz_20210303_1   450.86GiB

Metadata,RAID1: Size:26.00GiB, Used:9.16GiB (35.22%)
   /dev/mapper/douz_20210303_0    26.00GiB
   /dev/mapper/douz_20210303_1    26.00GiB

System,RAID1: Size:32.00MiB, Used:96.00KiB (0.29%)
   /dev/mapper/douz_20210303_0    32.00MiB
   /dev/mapper/douz_20210303_1    32.00MiB

Unallocated:
   /dev/mapper/douz_20210303_0     1.04MiB
   /dev/mapper/douz_20210303_1     1.04MiB
# btrfs files usage /mnt/douz_20210306
Overall:
    Device size:                 953.78GiB
    Device allocated:            953.78GiB
    Device unallocated:            2.09MiB
    Device missing:                  0.00B
    Device slack:                  7.00KiB
    Used:                        852.80GiB
    Free (estimated):             34.77GiB      (min: 34.77GiB)
    Free (statfs, df):            34.77GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,RAID1: Size:450.85GiB, Used:416.08GiB (92.29%)
   /dev/mapper/douz_20210306_0   450.85GiB
   /dev/mapper/douz_20210306_1   450.85GiB

Metadata,RAID1: Size:26.01GiB, Used:10.32GiB (39.69%)
   /dev/mapper/douz_20210306_0    26.01GiB
   /dev/mapper/douz_20210306_1    26.01GiB

System,RAID1: Size:32.00MiB, Used:96.00KiB (0.29%)
   /dev/mapper/douz_20210306_0    32.00MiB
   /dev/mapper/douz_20210306_1    32.00MiB

Unallocated:
   /dev/mapper/douz_20210306_0     1.04MiB
   /dev/mapper/douz_20210306_1     1.04MiB
adam900710 commented 1 year ago

@adam900710:

please keep an eye on the unallocated space.

What do you mean? The medium is 43% empty. What mitigating action is the user supposed to take?

Exactly what I mean, check "btrfs fi usage" output for "Unallocated" space.

If it's under 2GiB, then you need to consider balance or deleting files.

When the syndrome first presented, balancing operations were actually completing with reported success, but did not actually resolve the issue.

What's you balance parameters?

You need to check if the balance really freed up some chunks.

Again, what mitigating action is the user supposed to take?

It sounds like you are suggesting that it is not a supported BTRFS use case to freshly initialize a RAID1 filesystem with current default mkfs.btrfs settings, and run a send | receive to it with data comprising only 57% of the medium capacity.

Nope, I mean the workload is either causing very fragmented data usage or there is something else causing very inefficient data chunk usage. Note your reports have very low data usage percentage (55.96%), which is not common just by send/receive/tar operations.

And that's what gives you an illusion that btrfs has a lot of free space, but in the end either exhausted data and metadata space can lead to ENOSPC, and for your case, it's metadata exhausted first.

Zygo commented 1 year ago

In the reports, it looks like btrfs receive is somehow allocating about twice as much space for data as it is using

How exactly did you determine that? I.e. which field(s) in the info reports tell you that? That certainly sounds smoking-gun-ish.

Normally, as btrfs fills up, it will first fill up existing data block groups, then alternate between allocating and filling one data block group at a time after that, unless there were some very large deletions. btrfs receive doesn't have cause to delete very much--it is either creating an entirely new snapshot, or it's modifying a writable snapshot of an existing read-only snapshot, so even if files are removed from the target subvol, their data still exists and occupies space on the parent snapshot. So normally we'd expect btrfs receive to delete approximately nothing, and data allocations to be a few % or a few GiB larger than the data usage (whichever is larger).

Instead, you have this:

Data, RAID1: total=475.93GiB, used=69.49GiB (14.6%)

and this:

Data, RAID1: total=472.90GiB, used=264.62GiB (55.95%)

and this:

Data,RAID1: Size:472.90GiB, Used:264.62GiB (55.96%)

and this:

    Total devices 2 FS bytes used 268.16GiB
    devid    1 size 476.94GiB used 476.93GiB path /dev/mapper/douz_20230802_0
    devid    2 size 476.94GiB used 476.93GiB path /dev/mapper/douz_20230802_1
    (about 53.85%, includes some metadata so a different percentage)

Those are double-digit percentage differences between used and allocated space, and double-digit numbers of allocated but unused GiB, coming out of a workload that doesn't delete anything. If userspace isn't deleting very large things between ENOSPC and you running btrfs fi usage, then it's a bug in the allocator leading to overallocation of data. (Note that we need to confirm the condition in that sentence--hence, we need btrfs fi usage data over time, and to confirm that btrfs receive isn't trying to actually store more than 475 GiB of data.)

Overallocation by itself is not a smoking gun or root cause. The excess data allocations cause the ENOSPC failure, but the allocations are only a symptom of the problem.

It could be a bug in btrfs-progs' implementation of btrfs receive, but you said:

The same procedure worked consistently for years before, most recently on kernel 6.0.6, btrfs-progs 6.3.1. I haven't yet gotten it to work on kernel 6.4.

which suggests it's likely a kernelspace problem not a userspace one (you could confirm this by running the 6.3.1 binary on a 6.4 kernel). Also, allocation is normally the kernel's job anyway, though it's certainly possible to recreate the symptoms by alternately allocating and deallocating data from userspace.

Zygo commented 1 year ago

@adam900710

Note your reports have very low data usage percentage (55.96%), which is not common just by send/receive/tar operations.

That is the key point here. The data usage percentage is way off for this workload, and apparently only on 6.4 kernels. All the other problems would arise from that.

adam900710 commented 1 year ago

Any clue on details of the receive workload?

E.g. do older received subvolumes get deleted after some time period?

Another way to verify if it's really a regression of 6.4 is, to create an empty test btrfs (around 10G should be enough), then untar a large file (even if it tailed with ENOSPC), then check the "btrfs fi usage" output.

I'm not that sure if it's really a regression in v6.4.

douzzer commented 1 year ago

I do have vague suspicions of a kernel regression, not a user-tools regression, but we also have empirical evidence that the btrfs receive workload exacerbates the syndrome, while tar -x doesn't. Which is certainly intriguing.

As for the sequence to reproduce: there was no other activity on the target device. It was mkfs.btrfs followed immediately by btrfs receive, with no competing activity (not even reads). It was an express train to metadata space exhaustion, and the metadata block group never grew from its initial 1GB allocation.

@Zygo:

If userspace isn't deleting very large things between ENOSPC and you running btrfs fi usage, then it's a bug in the allocator leading to overallocation of data.

Yup seems like. On the user side, I've just built btrfs-progs-5.19 (end of the 5.x line) as representative of the-way-things-used-to-work, in case relevantly different. I'll retry the operation with that at some point today to try to narrow things down. Obviously it's normally a no-no to mismatch kernel and user version, but we're just experimenting here.

@adam900710:

Exactly what I mean, check "btrfs fi usage" output for "Unallocated" space.

During a btrfs receive of non-pathological data, totalling 55% of reported available space, to an initially empty destination filesystem, with no competing activity? That doesn't sound like a viable user experience, even if it works, which in my experience it didn't.

the workload is either causing very fragmented data usage or there is something else causing very inefficient data chunk usage.

Agreed. But it is not, apparently, in the nature of the data being transferred -- the identical data is causing no problems (so far?) with 4 other btrfs filesystems with copies of it, 3 of which are RAID1.

I'll report back later today on the results with btrfs 5.19.

douzzer commented 1 year ago

Retry with v5.19 btrfs was ineffective. I prepared the target by deleting the sole subvolume on it, which succeeded.

After deletion had completed, and before starting the transfer, usage looked like

# btrfs files usage /mnt/douz_20230802
Overall:
    Device size:                 953.87GiB
    Device allocated:             38.06GiB
    Device unallocated:          915.81GiB
    Device missing:                  0.00B
    Device slack:                  6.00KiB
    Used:                        320.00KiB
    Free (estimated):            473.90GiB      (min: 473.90GiB)
    Free (statfs, df):           473.90GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:              481.77MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,RAID1: Size:16.00GiB, Used:0.00B (0.00%)
   /dev/mapper/douz_20230802_0    16.00GiB
   /dev/mapper/douz_20230802_1    16.00GiB

Metadata,RAID1: Size:3.00GiB, Used:144.00KiB (0.00%)
   /dev/mapper/douz_20230802_0     3.00GiB
   /dev/mapper/douz_20230802_1     3.00GiB

System,RAID1: Size:32.00MiB, Used:16.00KiB (0.05%)
   /dev/mapper/douz_20230802_0    32.00MiB
   /dev/mapper/douz_20230802_1    32.00MiB

Unallocated:
   /dev/mapper/douz_20230802_0   457.90GiB
   /dev/mapper/douz_20230802_1   457.90GiB

During transfer, the metadata block group didn't grow, and ran out of space: transfer terminated with ENOSPC at 227 GB (of 265 total). Usage now looks like

# btrfs files usage /mnt/douz_20230802
Overall:
    Device size:                 953.87GiB
    Device allocated:            953.87GiB
    Device unallocated:            2.09MiB
    Device missing:                  0.00B
    Device slack:                  6.00KiB
    Used:                        459.88GiB
    Free (estimated):            246.95GiB      (min: 246.95GiB)
    Free (statfs, df):           246.95GiB
    Data ratio:                       2.00
    Metadata ratio:                   2.00
    Global reserve:                5.50MiB      (used: 0.00B)
    Multiple profiles:                  no

Data,RAID1: Size:473.90GiB, Used:226.95GiB (47.89%)
   /dev/mapper/douz_20230802_0   473.90GiB
   /dev/mapper/douz_20230802_1   473.90GiB

Metadata,RAID1: Size:3.00GiB, Used:2.99GiB (99.60%)
   /dev/mapper/douz_20230802_0     3.00GiB
   /dev/mapper/douz_20230802_1     3.00GiB

System,RAID1: Size:32.00MiB, Used:96.00KiB (0.29%)
   /dev/mapper/douz_20230802_0    32.00MiB
   /dev/mapper/douz_20230802_1    32.00MiB

Unallocated:
   /dev/mapper/douz_20230802_0     1.04MiB
   /dev/mapper/douz_20230802_1     1.04MiB

The "Device allocated" reached 100% of "Device size" early in the transfer.

I didn't try it on a fresh filesystem created by v5.19 mkfs.btrfs and will try that next.

douzzer commented 1 year ago

I posted a followup earlier this evening which has now disappeared without explanation.

Summary:

I've retried using current btrfs-progs, starting with a fresh mkfs.btrfs but with --mixed, and all is well. The block allocation grew just ahead of the transfer as it should, and the total allocated is now only slightly larger than the payload data itself.

I would still quite like to understand the root cause of the syndrome with separate data and metadata block groups.