Open karuboniru opened 4 days ago
Are you using simple mode qgroup for both cases?
Overall you can ignore the error and kernel has already marked the qgroup inconsistent and a new rescan will solve it.
But if you can share the workload to pin down the error, it would help a lot.
@adam900710
Are you using simple mode qgroup for both cases?
No, I see this error after switching to normal quota mode. It might be confusing to see the error qgroup rescan init failed, running in simple mode
in one of the log file. This is because I was testing squota (and decided to switch to full quota instead).
Overall you can ignore the error and kernel has already marked the qgroup inconsistent and a new rescan will solve it.
Yes, but a full rescan taks some time on a heavily snapshotted HDD, I think it would be great if the quota re-calculation could be done during the subvolume deletion process itself?
But if you can share the workload to pin down the error, it would help a lot.
The case I am facing is with the volume I used to store snapshots (sda
) from other sources, there will be a daily increment send stream (from nvme2n1
) to the volume and a cleanup of unwanted old snapshots after the send/receive (done by btrbk).
And all the current incident of such error I see is after a backup job that involves multiple subvolumes beging deleted. (Warning is triggered from either sda
or nvme2n1
)
Yes, but a full rescan taks some time on a heavily snapshotted HDD, I think it would be great if the quota re-calculation could be done during the subvolume deletion process itself?
Unfortunately it's not possible for full qgroup mode. The most problematic part is during snapshot drop, where we can drop a huge subtree in one transaction. That's why we have to mark qgroup inconsistent and skip the accounting, or we will got btrfs-transaction hanging for a long time, just for the qgroup handling.
Qgroup rescan on the other hand is way less costly, as all its workload can be done in several transactions thus no super long hang.
I'd say if your workload is snapshot heavy, then simple quota is a much better solution, but at the cost of accounting accuracy. With simple quota, you can hit cases like a fully dropped subvolume still taking quota numbers due to the design.
That's something you have to choose between accuracy and performance.
Happens on Fedora 41 with
6.11.6-300.fc41.x86_64
, the error is triggered by btrbk or podman when deleting unwanted subvolume.A dmesg of one such warning Another case with the warning happening on multiple volumes and devices