Btrfs warns "to be deleted qgroup 0/xxx has non-zero numbers" and left inconsistent quota status

karuboniru commented 4 days ago

Happens on Fedora 41 with 6.11.6-300.fc41.x86_64, the error is triggered by btrbk or podman when deleting unwanted subvolume.

A dmesg of one such warning Another case with the warning happening on multiple volumes and devices

[12324.919790] ------------[ cut here ]------------
[12324.919798] WARNING: CPU: 67 PID: 2882 at fs/btrfs/qgroup.c:1854 btrfs_remove_qgroup+0x3df/0x450
[12324.919810] Modules linked in: macvlan nft_nat nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_flow_offload rpcrdma rdma_cm iw_cm ib_cm ib_core nf_flow_table_inet nf_flow_table cfg80211 veth bridge stp llc wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel nf_conntrack_netbios_ns nf_conntrack_broadcast nft_masq nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject rfkill nft_chain_nat nf_nat ip_set nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables binfmt_misc vfat fat intel_rapl_msr amd_atl intel_rapl_common amd64_edac edac_mce_amd ipmi_ssif kvm_amd xfs kvm rapl acpi_cpufreq pcspkr acpi_ipmi i2c_piix4 ipmi_si ptdma k10temp i2c_smbus ipmi_devintf ipmi_msghandler tcp_bbr tun loop nfsd auth_rpcgss nfs_acl lockd grace nfnetlink zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 ast sha1_ssse3 ixgbe nvme igb nvme_core i2c_algo_bit dca sp5100_tco nvme_auth sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls cxgb3i
[12324.919958]  cxgb3 mdio libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse dm_multipath
[12324.919986] CPU: 67 UID: 0 PID: 2882 Comm: btrfs-cleaner Kdump: loaded Not tainted 6.11.6-300.fc41.x86_64 #1
[12324.919991] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EPYCD8, BIOS L2.52 11/25/2020
[12324.919994] RIP: 0010:btrfs_remove_qgroup+0x3df/0x450
[12324.919999] Code: bb 50 ff ff ff 00 75 22 48 83 bb 60 ff ff ff 00 75 18 48 83 bb 58 ff ff ff 00 75 0e 48 83 bb 68 ff ff ff 00 0f 84 9e fe ff ff <0f> 0b 48 c7 c6 a0 fe 4b 9d 48 c7 c7 00 9e 69 9e e8 8c 99 9f 00 85
[12324.920002] RSP: 0018:ffffa30f8ce77d78 EFLAGS: 00010206
[12324.920006] RAX: 0000000000000001 RBX: ffff8fbb28571eb8 RCX: 00000014acd4c043
[12324.920009] RDX: 0000000000000010 RSI: 0000000000055aa0 RDI: ffff8fbbb1257000
[12324.920011] RBP: 000000000000013f R08: 0000000000000000 R09: 0000000000000001
[12324.920014] R10: ffff8fd538a04f00 R11: 0028000032f30000 R12: ffff8fbbb1257920
[12324.920016] R13: ffff8fbbb1257000 R14: ffff8fbb28571e68 R15: 0000000000000000
[12324.920019] FS:  0000000000000000(0000) GS:ffff8fd9ce180000(0000) knlGS:0000000000000000
[12324.920022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12324.920025] CR2: 00007f92020706b0 CR3: 000000023d032000 CR4: 0000000000350ef0
[12324.920028] Call Trace:
[12324.920031]  <TASK>
[12324.920034]  ? btrfs_remove_qgroup+0x3df/0x450
[12324.920038]  ? __warn.cold+0x8e/0xe8
[12324.920043]  ? btrfs_remove_qgroup+0x3df/0x450
[12324.920055]  ? report_bug+0xff/0x140
[12324.920061]  ? handle_bug+0x3c/0x80
[12324.920065]  ? exc_invalid_op+0x17/0x70
[12324.920069]  ? asm_exc_invalid_op+0x1a/0x20
[12324.920081]  ? btrfs_remove_qgroup+0x3df/0x450
[12324.920086]  ? btrfs_remove_qgroup+0x274/0x450
[12324.920093]  btrfs_qgroup_cleanup_dropped_subvolume+0x97/0xc0
[12324.920098]  btrfs_drop_snapshot+0x44e/0xa80
[12324.920107]  ? __pfx_cleaner_kthread+0x10/0x10
[12324.920110]  btrfs_clean_one_deleted_snapshot+0xc3/0x110
[12324.920116]  cleaner_kthread+0xd8/0x130
[12324.920120]  kthread+0xd2/0x100
[12324.920125]  ? __pfx_kthread+0x10/0x10
[12324.920129]  ret_from_fork+0x34/0x50
[12324.920134]  ? __pfx_kthread+0x10/0x10
[12324.920137]  ret_from_fork_asm+0x1a/0x30
[12324.920147]  </TASK>
[12324.920149] ---[ end trace 0000000000000000 ]---
[12324.920152] BTRFS warning (device sda): to be deleted qgroup 0/319 has non-zero numbers, rfer 258478080 rfer_cmpr 258478080 excl 0 excl_cmpr 0

adam900710 commented 4 days ago

Are you using simple mode qgroup for both cases?

adam900710 commented 4 days ago

Overall you can ignore the error and kernel has already marked the qgroup inconsistent and a new rescan will solve it.

But if you can share the workload to pin down the error, it would help a lot.

karuboniru commented 3 days ago

@adam900710

Are you using simple mode qgroup for both cases?

No, I see this error after switching to normal quota mode. It might be confusing to see the error qgroup rescan init failed, running in simple mode in one of the log file. This is because I was testing squota (and decided to switch to full quota instead).

Overall you can ignore the error and kernel has already marked the qgroup inconsistent and a new rescan will solve it.

Yes, but a full rescan taks some time on a heavily snapshotted HDD, I think it would be great if the quota re-calculation could be done during the subvolume deletion process itself?

But if you can share the workload to pin down the error, it would help a lot.

The case I am facing is with the volume I used to store snapshots (sda) from other sources, there will be a daily increment send stream (from nvme2n1) to the volume and a cleanup of unwanted old snapshots after the send/receive (done by btrbk).

And all the current incident of such error I see is after a backup job that involves multiple subvolumes beging deleted. (Warning is triggered from either sda or nvme2n1)

adam900710 commented 3 days ago

Yes, but a full rescan taks some time on a heavily snapshotted HDD, I think it would be great if the quota re-calculation could be done during the subvolume deletion process itself?

Unfortunately it's not possible for full qgroup mode. The most problematic part is during snapshot drop, where we can drop a huge subtree in one transaction. That's why we have to mark qgroup inconsistent and skip the accounting, or we will got btrfs-transaction hanging for a long time, just for the qgroup handling.

Qgroup rescan on the other hand is way less costly, as all its workload can be done in several transactions thus no super long hang.

I'd say if your workload is snapshot heavy, then simple quota is a much better solution, but at the cost of accounting accuracy. With simple quota, you can hit cases like a fully dropped subvolume still taking quota numbers due to the design.

That's something you have to choose between accuracy and performance.

kdave / btrfs-progs

Btrfs warns "to be deleted qgroup 0/xxx has non-zero numbers" and left inconsistent quota status #922