Closed eliran-zada-zesty closed 2 months ago
Looks like you're out of unallocated disk space. "Device unallocated": "4.00MiB",
@Forza-tng Hey, i had same issue as mentioned in this ticket, can u elaborate a bit more what it means that i ran out of unallocated disk space? what can lead to it and how can i prevent it? Thanks in advance :)
Thanks @Forza-tng for your response! Not sure how it's related since Used (7.56TiB) vs Device size/allocated (8.06TiB) shows that there's a huge space... can you please elaborate? Also, if it is related? Can I avoid it somehow?
Btrfs uses a multi-stage allocator to manager its storage. Btrfs has three different types of data; System, Metadata and Data. Btrfs divides the available disk space into chunks or block groups dedicated for each data type. When Btrfs needs to write to the filesystem it has to write this into a block group that is dedicated for the specific type of data.
The block group allocation happens as it is needed. So, on a new filesystem, most of the disk space is in an unallocated
state, and Btrfs can claim that into new block groups as needed. A chunk is allocated 1GiB at a time. A Block group is 1 or several chunks combined according to the profile
used. So for SINGLE
profile, a block group is simply one chunk, or 1 GiB, but for RAID1
, a block group consists of 2 chunks, one on each device.
The no disk space errors (or ENOSPC as sometimes seen) means that Btrfs has no more unallocated
disk space to create additional block groups.
Let's look at the @eliran-zada-zesty problem:
My guess here is that Btrfs wants to allocate additional METADATA block group, but there is only 4MiB unallocated that can be used. Therefore Btrfs cannot finish its transaction and turns the filesystem read-only to protect against corruption. We can see there is about 500GiB allocated, but unused DATA block groups. These could be reclaimed into unallocated
using btrfs balance
.
@eliran-zada-zesty If you can unmount and mount the filesystem and it does not immediately turn read-only, you can try btrfs balance start -dusage=0
to relcaim empty block groups, so that additional metadata block groups can be allocated.
Normally, Btrfs will reclaim empty block groups, but sometimes we end up in situations where the block groups are underused and cannot be automatically reclaimed. This is why using it is important to monitor and balance data block groups as needed.
I usually use btrfs filesystem usage -T
to view disk space allocation. Here is one of my systems:
# btrfs fi us -T /
Overall:
Device size: 203.89GiB
Device allocated: 127.07GiB
Device unallocated: 76.82GiB
Device missing: 0.00B
Device slack: 24.00GiB
Used: 92.52GiB
Free (estimated): 93.72GiB (min: 55.32GiB)
Free (statfs, df): 93.72GiB
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Multiple profiles: no
Data Metadata System
Id Path single DUP DUP Unallocated Total Slack
-- -------------- --------- -------- -------- ----------- --------- --------
1 /dev/nvme0n1p5 101.01GiB 26.00GiB 64.00MiB 76.82GiB 203.89GiB 24.00GiB
-- -------------- --------- -------- -------- ----------- --------- --------
Total 101.01GiB 13.00GiB 32.00MiB 76.82GiB 203.89GiB 24.00GiB
Used 84.10GiB 4.21GiB 16.00KiB
@Forza-tng you're awesome! Your insights are very helpful and make sense, I'll check it out.
@Forza-tng do you have any idea how I can recreate it? Couldn't do it...
@eliran-zada-zesty
@Forza-tng do you have any idea how I can recreate it? Couldn't do it...
We need some details on the exact commands you used as well as dmesg output.
mount /dev/nvme13n1 /mnt/btrfs
btrfs balance start -dusage=0
btrfs balance start -dusage=50,limit=1
You are also using a very old kernel and btrfs-progs. It may be that it's possible to mount and correct the issue on newer kernels, such as a Fedora live CD.
You may get quicker responses on the IRC channel #btrfs on Libera.chat.
Thanks! will check
Problem
A btrfs file system of about 8TB serving a Kafka workload on an Amazon Linux 2 with kernel version 5.4 failed to read-only state.
Instance data
kernel version: 5.4.271-184.369.amzn2.aarch64 OS type: Amazon Linux 2 CPU Type: Arm-based AWS Graviton2 btrfs-progs: 4.15.1
Instance state per-failure
Memory: 62.0% CPU: 27.6%-7.3%
BTRFS state per-failure
{ "Overall": { "Device size": "8.06TiB", "Device allocated": "8.06TiB", "Device unallocated": "4.00MiB", "Device missing": "0.00B", "Used": "7.56TiB", "Free (estimated)": "512.43GiB (min: 512.43GiB)", "Data ratio": "1.00", "Metadata ratio": "1.00", "Global reserve": "512.00MiB (used: 0.00B)" }, "Data,single": { "Size": "8.05TiB", "Used": "7.55TiB", "/dev/nvme13n1": "7.67TiB", "/dev/nvme14n1": "143.00GiB", "/dev/nvme4n1": "105.00GiB", "/dev/nvme8n1": "143.00GiB" }, "Metadata,single": { "Size": "12.00GiB", "Used": "10.79GiB", "/dev/nvme13n1": "12.00GiB" }, "System,single": { "Size": "32.00MiB", "Used": "976.00KiB", "/dev/nvme13n1": "32.00MiB" }, "Unallocated": { "/dev/nvme13n1": "1.00MiB", "/dev/nvme14n1": "1.00MiB", "/dev/nvme4n1": "1.00MiB", "/dev/nvme8n1": "1.00MiB" } }
Observed issue