naota / linux

Linux kernel source tree
Other
3 stars 1 forks source link

Errors when committing superblock #54

Open kdave opened 3 years ago

kdave commented 3 years ago

I've never seen this error in practice so it's likely related to zoned mode or it's been cause by frequent bg relocation. The device is emulated zone with 1MiB zone size (10243 in total), workload is git checkout in a loop. I have the raw block device image and the backing zoned file saved.

[11647.268646] BTRFS info (device sdi): reclaiming chunk 1769996288 with 4% used 95% unusable
[11647.277391] BTRFS info (device sdi): relocating block group 1769996288 flags metadata
[11648.244679] BTRFS info (device sdi): found 3 extents, stage: move data extents
[11649.140962] BTRFS info (device sdi): reclaiming chunk 1772093440 with 25% used 75% unusable
[11649.149842] BTRFS info (device sdi): relocating block group 1772093440 flags metadata
[11650.611832] BTRFS info (device sdi): found 12 extents, stage: move data extents
[11651.472763] BTRFS info (device sdi): reclaiming chunk 1774190592 with 23% used 76% unusable
[11651.481587] BTRFS info (device sdi): relocating block group 1774190592 flags metadata
[11652.402776] BTRFS info (device sdi): found 13 extents, stage: move data extents
[11653.280440] BTRFS info (device sdi): reclaiming chunk 1775239168 with 23% used 76% unusable
[11653.289266] BTRFS info (device sdi): relocating block group 1775239168 flags metadata
[11654.251984] BTRFS info (device sdi): found 12 extents, stage: move data extents
[11655.170664] BTRFS info (device sdi): reclaiming chunk 1776287744 with 23% used 76% unusable
[11655.179520] BTRFS info (device sdi): relocating block group 1776287744 flags metadata
[11656.167540] BTRFS info (device sdi): found 12 extents, stage: move data extents
[11657.061457] BTRFS info (device sdi): reclaiming chunk 1777336320 with 23% used 76% unusable
[11657.070311] BTRFS info (device sdi): relocating block group 1777336320 flags metadata
[11657.983090] BTRFS info (device sdi): found 13 extents, stage: move data extents
[11658.868494] BTRFS info (device sdi): reclaiming chunk 1778384896 with 23% used 76% unusable
[11658.877311] BTRFS info (device sdi): relocating block group 1778384896 flags metadata
[11659.772862] BTRFS info (device sdi): found 12 extents, stage: move data extents
[11692.739276] BTRFS info (device sdi): reclaiming chunk 1758461952 with 15% used 84% unusable
[11692.748449] BTRFS info (device sdi): relocating block group 1758461952 flags metadata
[11692.888585] BTRFS error (device sdi): system chunk array too small 0 < 97
[11692.895613] BTRFS error (device sdi): super block corruption detected before writing it to disk
[11692.904481] BTRFS: error (device sdi) in write_all_supers:4163: errno=-117 Filesystem corrupted (unexpected superblock corruption detected)
[11692.917160] BTRFS info (device sdi): forced readonly
[11692.917168] BTRFS warning (device sdi): Skipping commit of aborted transaction.
[11692.917173] ------------[ cut here ]------------
[11692.917175] BTRFS: Transaction aborted (error -117)
[11692.917255] WARNING: CPU: 5 PID: 10532 at fs/btrfs/transaction.c:1946 cleanup_transaction+0x6b/0xc0 [btrfs]
[11692.949374] Modules linked in: iscsi_target_mod tcm_loop target_core_pscsi target_core_file target_core_iblock nfs lockd grace sunrpc fscache target_core_user uio target_core_mod netfs af_packet i2c_algo_bit drm_ttm_helper ttm br_netfilter bridge drm_kms_helper btrfs stp llc syscopyarea sysfillrect sysimgblt fb_sys_fops drm iscsi_ibft blake2b_generic libcrc32c iscsi_boot_sysfs xor zstd_decompress zstd_compress tg3 tpm_infineon tpm_tis tpm_tis_core tpm libphy xxhash raid6_pq acpi_cpufreq mptctl button i2c_piix4 k10temp ext4 mbcache jbd2 ohci_pci ehci_pci ohci_hcd mptsas ata_generic ehci_hcd scsi_transport_sas mptscsih serio_raw sata_svw mptbase usbcore pata_serverworks sg scsi_dh_rdac scsi_dh_emc scsi_dh_alua
[11693.012551] CPU: 5 PID: 10532 Comm: kworker/u16:4 Not tainted 5.13.0-git+ #784
[11693.012559] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
[11693.012564] Workqueue: events_unbound btrfs_reclaim_bgs_work [btrfs]
[11693.033318] RIP: 0010:cleanup_transaction+0x6b/0xc0 [btrfs]
[11693.033430] Code: 00 1c 00 00 03 72 25 41 83 fc fb 0f 84 d8 f9 0b 00 41 83 fc e2 0f 84 ce f9 0b 00 44 89 e6 48 c7 c7 38 b2 fa c0 e8 89 db a3 e9 <0f> 0b 44 89 e1 ba 9a 07 00 00 49 8d 5e 28 48 89 ef 48 c7 c6 b0 9e
[11693.033435] RSP: 0018:ffff9d524b2d3c20 EFLAGS: 00010282
[11693.033440] RAX: 0000000000000000 RBX: 00000000ffffff8b RCX: ffff8b8b679d9978
[11693.033443] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8b8b679d9970
[11693.033445] RBP: ffff8b8a8022ba78 R08: 0000000000000000 R09: 0000000000000000
[11693.033448] R10: ffffffffaa1106c8 R11: 00000000312d2072 R12: 00000000ffffff8b
[11693.092366] R13: ffff8b8a88054000 R14: ffff8b8a6bd0d000 R15: ffff8b8a88054000
[11693.092370] FS:  0000000000000000(0000) GS:ffff8b8b67800000(0000) knlGS:0000000000000000
[11693.092373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11693.092375] CR2: 00007ff2ad38e4c0 CR3: 0000000191cec000 CR4: 00000000000006e0
[11693.092379] Call Trace:
[11693.092388]  btrfs_commit_transaction.cold+0x3e/0xe2 [btrfs]
[11693.129246]  relocate_block_group+0x55/0x580 [btrfs]
[11693.129356]  ? btrfs_wait_nocow_writers+0x33/0xc0 [btrfs]
[11693.129470]  btrfs_relocate_block_group+0x177/0x340 [btrfs]
[11693.145810]  btrfs_relocate_chunk+0x38/0x120 [btrfs]
[11693.145915]  btrfs_reclaim_bgs_work.cold+0x73/0x167 [btrfs]
[11693.156746]  process_one_work+0x262/0x600
[11693.160871]  worker_thread+0x4c/0x320
[11693.160875]  ? process_one_work+0x600/0x600
[11693.160880]  kthread+0x135/0x160
[11693.172307]  ? set_kthread_struct+0x40/0x40
[11693.172312]  ret_from_fork+0x1f/0x30
[11693.172323] irq event stamp: 707214
[11693.183906] hardirqs last  enabled at (707213): [<ffffffffaa996974>] _raw_spin_unlock_irq+0x24/0x40
[11693.183915] hardirqs last disabled at (707214): [<ffffffffaa98e7f4>] __schedule+0x3d4/0x860
[11693.201644] softirqs last  enabled at (707146): [<ffffffffaac00307>] __do_softirq+0x307/0x502
[11693.201651] softirqs last disabled at (707139): [<ffffffffaa08ab27>] __irq_exit_rcu+0xa7/0xd0
[11693.201657] ---[ end trace 7ec09b227c3735c2 ]---
[11693.201662] BTRFS: error (device sdi) in cleanup_transaction:1946: errno=-117 Filesystem corrupted
[11693.202230] BTRFS error (device sdi): error relocating chunk 1758461952
[11693.239552] BTRFS info (device sdi): reclaiming chunk 1779433472 with 17% used 82% unusable
[11693.239580] BTRFS error (device sdi): error relocating chunk 1779433472

btrfs fi df after it got stuck:

Data, single: total=238.00MiB, used=192.06MiB, zone_unusable=31.75MiB
System, single: total=1.00MiB, used=48.00KiB, zone_unusable=432.00KiB
Metadata, single: total=13.00MiB, used=2.06MiB, zone_unusable=8.94MiB
GlobalReserve, single: total=3.25MiB, used=16.00KiB

system chunk array too small 0 < 97 the zero probably means the page is all zeros, I don't have other data as it's all in memory.

# btrfs check --force /dev/sdi
Opening filesystem to check...
WARNING: filesystem mounted, continuing because of --force
No mapping for 1773486080-1773502464
Couldn't map the block 1773486080
Couldn't map the block 1773486080
bad tree block 1773486080, bytenr mismatch, want=1773486080, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system

Note where the mapping error is: the chunk got reclaimed close to the point it crashed (look for in 1772093440 in the dump).

# btrfs inspect dump-super /dev/sdi
superblock: bytenr=65536, device=/dev/sdi
---------------------------------------------------------
csum_type               0 (crc32c)
csum_size               4
csum                    0x7afc6e5c [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    59447fd2-f548-443d-abe8-79cdf178321f
metadata_uuid           59447fd2-f548-443d-abe8-79cdf178321f
label                   TEST-ZONED
generation              3507
root                    1781022720
sys_array_size          97
chunk_root_generation   3507
root_level              0
chunk_root              1773486080
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             10740056064
bytes_used              203599872
sectorsize              4096
nodesize                16384
leafsize (deprecated)   16384
stripesize              4096
root_dir                6
num_devices             1
compat_flags            0x0
compat_ro_flags         0x3
                        ( FREE_SPACE_TREE |
                          FREE_SPACE_TREE_VALID )
incompat_flags          0x1361
                        ( MIXED_BACKREF |
                          BIG_METADATA |
                          EXTENDED_IREF |
                          SKINNY_METADATA |
                          NO_HOLES |
                          ZONED )
cache_generation        0
uuid_tree_generation    3507
dev_item.uuid           434d4002-e9cf-4063-9045-082fea25ee85
dev_item.fsid           59447fd2-f548-443d-abe8-79cdf178321f [match]
dev_item.type           0
dev_item.total_bytes    10740056064
dev_item.bytes_used     577765376
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0