Open Wingar opened 6 months ago
Can you faddr2line this?
LZ4HC_compress_generic+0x37f/0x1ac0
On Sun, Mar 10, 2024 at 3:54 AM Emily Scarlett @.***> wrote:
I'm running a Gentoo machine on Kernel 6.7.6 (from Gentoo-sources) with a 2-tier bcachefs filesystem (erasure coding, 2-replicas, encryption, lz4 compression), and every now and then the bch-rebalance kernel thread on my fs crashes, which renders writing to the background target inoperable.
However, reads from all devices perfectly functional and writes to the foreground target function fine (All functions, even clearing cache to free space for incoming data). No loss in btree, journal, et al functionality, just the automated flush from foreground to background target.
I can somewhat reliably replicate this when I have multiple processes writing to the FS at one time (Multiple samba connections, lftp, et al). The circumstances tend to be when I'm pushing the write speed of the foreground SSDs to their limit (700MB/s or so per SSD), however this does not happen when one or a few processes are writing at once, only when there are many.
After this happens, any attempt to unmount the filesystem (It is not root) results in a hung umount and zombified process. The best way to resolve the issue is to reboot the VM, waiting for systemd to forcibly kill the processes and threads itself. This is the cleanest way to shut down, but even then the occasional fsck shows up some small but fixable errors.
The machine itself is a VM running under KVM/Libvirt on a Rocky Linux 9.3 host, with each device passed through as raw block devices with the virtio bus, no caching, type raw, io native. I've included a segment of the VMs libvirt configuration xml to show how it is configured directly.
My configuration is 2 (200GB) foreground SSDs, 14 (1.2TB) background HDDs. Filesystem/Mount options: metadata_replicas=2,data_replicas=2,background_compression=lz4:7,metadata_target=ssd,foreground_target=ssd,background_target=hdd,promote_target=ssd,erasure_code,verbose
Dmesg announcing thread crash:
[ 2435.684750] #PF: supervisor write access in kernel mode [ 2435.684788] #PF: error_code(0x0002) - not-present page [ 2435.684820] PGD 100000067 P4D 100000067 PUD 1001ee067 PMD 16c7b1067 PTE 0 [ 2435.684886] Oops: 0002 [#1] PREEMPT SMP PTI [ 2435.684920] CPU: 0 PID: 788 Comm: bch-rebalance/2 Not tainted 6.7.6-gentoo-x86_64 #2 [ 2435.684957] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20230524-4.el9_3 05/24/2023 [ 2435.684995] RIP: 0010:LZ4HC_compress_generic+0x37f/0x1ac0 [lz4hc_compress] [ 2435.685038] Code: 00 83 f9 0e 0f 8f cb 15 00 00 48 8b 7c 24 10 89 ca c1 e2 04 88 17 48 8b 4c 24 38 48 8d 14 30 48 8b 31 48 83 c0 08 48 83 c1 08 <48> 89 70 f8 48 39 d0 72 ec 48 8b 44 24 20 48 8b 7c 24 50 48 83 c2 [ 2435.685103] RSP: 0018:ffffb9cb8114f6e0 EFLAGS: 00010296 [ 2435.685137] RAX: ffffb9cb8dc2d001 RBX: ffffb9cb8dc1d000 RCX: ffffb9cb8dc3df00 [ 2435.685169] RDX: ffffb9cb8dc2cffa RSI: fbc66285b18817ff RDI: ffffb9cb8dc1d000 [ 2435.685201] RBP: 0000000000010000 R08: ffff97f6b4620000 R09: 0000000000010000 [ 2435.685233] R10: ffffb9cb8dc1d000 R11: 00000000c66285b1 R12: ffff97f6b4620000 [ 2435.685265] R13: 0000000000000000 R14: ffffb9cb8dc3def9 R15: ffffb9cb8dc3defd [ 2435.685297] FS: 0000000000000000(0000) GS:ffff97ff3fa00000(0000) knlGS:0000000000000000 [ 2435.685332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2435.685367] CR2: ffffb9cb8dc2d000 CR3: 00000001be806002 CR4: 0000000000170ef0 [ 2435.685402] Call Trace: [ 2435.685429]
[ 2435.685458] ? die+0x23/0x70 [ 2435.685496] ? page_fault_oops+0x15d/0x440 [ 2435.685529] ? fixup_exception+0x26/0x310 [ 2435.685580] ? exc_page_fault+0x16a/0x170 [ 2435.685626] ? asm_exc_page_fault+0x26/0x30 [ 2435.685661] ? LZ4HC_compress_generic+0x37f/0x1ac0 [lz4hc_compress] [ 2435.685698] LZ4_compress_HC+0x7b/0x90 [lz4hc_compress] [ 2435.685734] attempt_compress+0x1e6/0x200 [bcachefs] [ 2435.685944] ? __get_free_pages+0x11/0x40 [ 2435.685978] ? mempool_alloc_vp+0x2f/0x50 [bcachefs] [ 2435.686062] ? mempool_alloc+0x66/0x1a0 [ 2435.686097] bch2_bio_compress+0x22c/0x4c0 [bcachefs] [ 2435.686183] bch2_write+0x122f/0x1340 [bcachefs] [ 2435.686281] ? bch2_increment_clock+0x2d/0x140 [bcachefs] [ 2435.686365] ? _raw_spin_unlock+0xe/0x30 [ 2435.686397] ? bch2_write+0x2c4/0x450 [bcachefs] [ 2435.686486] ? bch2_moving_ctxt_do_pending_writes+0xea/0x120 [bcachefs] [ 2435.686600] bch2_moving_ctxt_do_pending_writes+0xea/0x120 [bcachefs] [ 2435.686698] bch2_move_ratelimit+0x1b4/0x410 [bcachefs] [ 2435.686791] ? pfx_autoremove_wake_function+0x10/0x10 [ 2435.686839] do_rebalance+0x13a/0x830 [bcachefs] [ 2435.686959] ? kvm_sched_clock_read+0x11/0x20 [ 2435.687448] ? local_clock_noinstr+0xd/0xb0 [ 2435.687809] ? bch2_trans_get+0x303/0x360 [bcachefs] [ 2435.688171] ? pfx_bch2_rebalance_thread+0x10/0x10 [bcachefs] [ 2435.688523] bch2_rebalance_thread+0x57/0xa0 [bcachefs] [ 2435.688913] ? bch2_rebalance_thread+0x4d/0xa0 [bcachefs] [ 2435.689275] ? pfx_closure_sync_fn+0x10/0x10 [ 2435.689583] kthread+0xe8/0x120 [ 2435.689868] ? __pfx_kthread+0x10/0x10 [ 2435.690125] ret_from_fork+0x34/0x50 [ 2435.690416] ? pfx_kthread+0x10/0x10 [ 2435.690735] ret_from_fork_asm+0x1b/0x30 [ 2435.691018] [ 2435.691255] Modules linked in: poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha bcachefs crc64 lz4hc_compress lz4_compress xor raid6_pq intel_rapl_msr intel_rapl_common vfat rapl fat iTCO_wdt iTCO_vendor_support i2c_i801 pcspkr lpc_ich i2c_smbus mfd_core virtio_balloon joydev drm backlight fuse loop efi_pstore i2c_core dm_mod configfs nfnetlink xfs sr_mod cdrom crct10dif_pclmul crc32_pclmul libcrc32c crc32c_intel ghash_clmulni_intel sha512_ssse3 xhci_pci xhci_pci_renesas virtio_net ahci sha256_ssse3 net_failover xhci_hcd sha1_ssse3 libahci failover serio_raw efivarfs qemu_fw_cfg virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio_rng aesni_intel crypto_simd cryptd [ 2435.692732] CR2: ffffb9cb8dc2d000 [ 2435.693016] ---[ end trace 0000000000000000 ]--- [ 2437.015754] RIP: 0010:LZ4HC_compress_generic+0x37f/0x1ac0 [lz4hc_compress] [ 2437.017693] Code: 00 83 f9 0e 0f 8f cb 15 00 00 48 8b 7c 24 10 89 ca c1 e2 04 88 17 48 8b 4c 24 38 48 8d 14 30 48 8b 31 48 83 c0 08 48 83 c1 08 <48> 89 70 f8 48 39 d0 72 ec 48 8b 44 24 20 48 8b 7c 24 50 48 83 c2 [ 2437.018456] RSP: 0018:ffffb9cb8114f6e0 EFLAGS: 00010296 [ 2437.018868] RAX: ffffb9cb8dc2d001 RBX: ffffb9cb8dc1d000 RCX: ffffb9cb8dc3df00 [ 2437.019185] RDX: ffffb9cb8dc2cffa RSI: fbc66285b18817ff RDI: ffffb9cb8dc1d000 [ 2437.019485] RBP: 0000000000010000 R08: ffff97f6b4620000 R09: 0000000000010000 [ 2437.019797] R10: ffffb9cb8dc1d000 R11: 00000000c66285b1 R12: ffff97f6b4620000 [ 2437.020095] R13: 0000000000000000 R14: ffffb9cb8dc3def9 R15: ffffb9cb8dc3defd [ 2437.020421] FS: 0000000000000000(0000) GS:ffff97ff3fa00000(0000) knlGS:0000000000000000 [ 2437.020843] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2437.021268] CR2: ffffb9cb8dc2d000 CR3: 00000001be806002 CR4: 0000000000170ef0 [ 2437.021735] note: bch-rebalance/2[788] exited with irqs disabledbcachefs fs usage:
Filesystem: 2353ad4f-f54a-4a6d-b838-596270f9eebc Size: 14.4 TiB Used: 5.40 TiB Online reserved: 23.0 KiB
Data type Required/total Durability Devices btree: 1/2 2 [vde vdq] 2.25 GiB btree: 1/2 2 [vdq vdp] 44.5 GiB user: 1/2 2 [vdi vdk] 581 MiB user: 1/2 2 [vdd vdb] 5.31 GiB user: 1/2 2 [vdo vdc] 518 MiB user: 1/2 2 [vde vdm] 1.83 GiB user: 1/2 2 [vdg vdl] 2.80 GiB user: 1/2 2 [vdk vdn] 319 MiB user: 1/2 2 [vde vdd] 3.27 GiB user: 1/2 2 [vdd vdi] 2.41 GiB user: 1/2 2 [vdf vdn] 2.09 GiB user: 1/2 2 [vdh vdk] 3.72 GiB user: 1/2 2 [vdj vdl] 3.79 GiB user: 1/2 2 [vdl vdb] 336 MiB user: 1/1 1 [vde] 512 B user: 1/2 2 [vde vdi] 1.96 GiB user: 1/2 2 [vde vdb] 5.41 GiB user: 1/2 2 [vdd vdm] 1.49 GiB user: 1/2 2 [vdf vdj] 3.52 GiB user: 1/2 2 [vdg vdh] 3.43 GiB user: 1/2 2 [vdg vdc] 4.93 GiB user: 1/2 2 [vdh vdo] 9.17 GiB user: 1/2 2 [vdi vdo] 5.20 GiB user: 1/2 2 [vdj vdc] 459 MiB user: 1/2 2 [vdl vdm] 9.40 GiB user: 1/2 2 [vdm vdb] 2.69 GiB user: 1/1 1 [vdj] 30.5 KiB user: 1/2 2 [vde vdg] 3.86 GiB user: 1/2 2 [vde vdk] 2.43 GiB user: 1/2 2 [vde vdo] 1.51 GiB user: 1/2 2 [vdd vdg] 5.60 GiB user: 1/2 2 [vdd vdk] 3.53 GiB user: 1/2 2 [vdd vdo] 1.20 GiB user: 1/2 2 [vdf vdh] 501 MiB user: 1/2 2 [vdf vdl] 1.19 GiB user: 1/2 2 [vdf vdc] 4.80 GiB user: 1/2 2 [vdg vdj] 3.66 GiB user: 1/2 2 [vdg vdn] 309 MiB user: 1/2 2 [vdh vdi] 628 MiB user: 1/2 2 [vdh vdm] 1.22 GiB user: 1/2 2 [vdh vdb] 5.19 GiB user: 1/2 2 [vdi vdm] 556 MiB user: 1/2 2 [vdi vdb] 322 MiB user: 1/2 2 [vdj vdn] 4.51 GiB user: 1/2 2 [vdk vdl] 591 MiB user: 1/2 2 [vdk vdc] 7.16 GiB user: 1/2 2 [vdl vdo] 1.59 GiB user: 1/2 2 [vdm vdo] 8.98 GiB user: 1/2 2 [vdn vdc] 616 MiB user: 1/2 2 [vdc vdb] 2.39 GiB user: 1/1 1 [vdd] 64.0 KiB user: 1/1 1 [vdm] 1.50 KiB user: 1/2 2 [vde vdf] 3.56 GiB user: 1/2 2 [vde vdh] 2.97 GiB user: 1/2 2 [vde vdj] 2.80 GiB user: 1/2 2 [vde vdl] 2.29 GiB user: 1/2 2 [vde vdn] 987 MiB user: 1/2 2 [vde vdc] 4.71 GiB user: 1/2 2 [vdd vdf] 3.88 GiB user: 1/2 2 [vdd vdh] 3.38 GiB user: 1/2 2 [vdd vdj] 2.88 GiB user: 1/2 2 [vdd vdl] 1.80 GiB user: 1/2 2 [vdd vdn] 1.43 GiB user: 1/2 2 [vdd vdc] 1.67 GiB user: 1/2 2 [vdf vdg] 2.69 GiB user: 1/2 2 [vdf vdi] 3.98 GiB user: 1/2 2 [vdf vdk] 4.74 GiB user: 1/2 2 [vdf vdm] 3.45 GiB user: 1/2 2 [vdf vdo] 583 MiB user: 1/2 2 [vdf vdb] 2.83 GiB user: 1/2 2 [vdg vdi] 629 MiB user: 1/2 2 [vdg vdk] 3.03 GiB user: 1/2 2 [vdg vdm] 1.28 GiB user: 1/2 2 [vdg vdo] 1.56 GiB user: 1/2 2 [vdg vdb] 4.16 GiB user: 1/2 2 [vdh vdj] 1.82 GiB user: 1/2 2 [vdh vdl] 3.93 GiB user: 1/2 2 [vdh vdn] 1.45 GiB user: 1/2 2 [vdh vdc] 334 MiB user: 1/2 2 [vdi vdj] 562 MiB user: 1/2 2 [vdi vdl] 511 MiB user: 1/2 2 [vdi vdn] 14.4 GiB user: 1/2 2 [vdi vdc] 6.07 GiB user: 1/2 2 [vdj vdk] 5.03 GiB user: 1/2 2 [vdj vdm] 827 MiB user: 1/2 2 [vdj vdo] 5.56 GiB user: 1/2 2 [vdj vdb] 2.32 GiB user: 1/2 2 [vdk vdm] 632 MiB user: 1/2 2 [vdk vdo] 464 MiB user: 1/2 2 [vdk vdb] 5.56 GiB user: 1/2 2 [vdl vdn] 9.27 GiB user: 1/2 2 [vdl vdc] 260 MiB user: 1/2 2 [vdm vdn] 1.52 GiB user: 1/2 2 [vdm vdc] 3.90 GiB user: 1/2 2 [vdn vdo] 509 MiB user: 1/2 2 [vdn vdb] 337 MiB user: 1/2 2 [vdo vdb] 960 MiB user: 1/2 2 [vdq vdp] 135 GiB user: 13/14 14 [vde vdd vdf vdg vdh vdi vdj vdk vdl vdm vdn vdo vdc vdb] 4.58 TiB user: 14/15 15 [vde vdd vdf vdg vdh vdi vdj vdk vdl vdm vdn vdo vdc vdb vdp] 2.75 MiB user: 14/15 15 [vde vdd vdf vdg vdh vdi vdj vdk vdl vdm vdn vdo vdc vdb vdq] 6.50 MiB cached: 1/1 1 [vdi] 76.3 MiB cached: 1/1 1 [vdb] 77.0 MiB cached: 1/1 1 [vdd] 73.3 MiB cached: 1/1 1 [vdm] 73.2 MiB parity: 14/15 15 [vde vdd vdf vdg vdh vdi vdj vdk vdl vdm vdn vdo vdc vdb vdq] 512 KiB cached: 1/1 1 [vdg] 83.3 MiB cached: 1/1 1 [vdk] 71.8 MiB cached: 1/1 1 [vdo] 76.1 MiB cached: 1/1 1 [vdp] 74.2 GiB parity: 14/15 15 [vde vdd vdf vdg vdh vdi vdj vdk vdl vdm vdn vdo vdc vdb vdp] 256 KiB cached: 1/1 1 [vde] 953 MiB cached: 1/1 1 [vdf] 87.5 MiB cached: 1/1 1 [vdh] 76.3 MiB cached: 1/1 1 [vdj] 76.1 MiB cached: 1/1 1 [vdl] 76.6 MiB cached: 1/1 1 [vdn] 75.4 MiB cached: 1/1 1 [vdc] 73.9 MiB cached: 1/1 1 [vdq] 86.5 GiB parity: 13/14 14 [vde vdd vdf vdg vdh vdi vdj vdk vdl vdm vdn vdo vdc vdb] 361 GiB
hdd.hdd01 (device 0): vde rw data buckets fragmented free: 733 GiB 3002065 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 1.13 GiB 4608 user: 18.7 GiB 77129 103 MiB cached: 953 MiB 6109 parity: 25.7 GiB 105112 stripe: 335 GiB 1375338 328 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd02 (device 1): vdd rw data buckets fragmented free: 735 GiB 3011666 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.9 GiB 77657 106 MiB cached: 73.3 MiB 588 parity: 25.9 GiB 106092 stripe: 335 GiB 1374358 331 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd03 (device 2): vdf rw data buckets fragmented free: 735 GiB 3011677 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77544 104 MiB cached: 87.5 MiB 690 parity: 25.9 GiB 106131 stripe: 335 GiB 1374319 334 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd03 (device 3): vdg rw data buckets fragmented free: 735 GiB 3011500 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.9 GiB 77766 103 MiB cached: 83.3 MiB 645 parity: 25.9 GiB 106147 stripe: 335 GiB 1374303 337 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd04 (device 4): vdh rw data buckets fragmented free: 735 GiB 3011914 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77357 102 MiB cached: 76.3 MiB 640 parity: 25.8 GiB 105698 stripe: 335 GiB 1374752 337 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd05 (device 5): vdi rw data buckets fragmented free: 735 GiB 3011954 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77367 103 MiB cached: 76.3 MiB 590 parity: 25.8 GiB 105733 stripe: 335 GiB 1374717 326 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd06 (device 6): vdj rw data buckets fragmented free: 735 GiB 3011931 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77375 111 MiB cached: 76.1 MiB 605 parity: 25.8 GiB 105714 stripe: 335 GiB 1374736 324 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd07 (device 7): vdk rw data buckets fragmented free: 735 GiB 3011987 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77350 106 MiB cached: 71.8 MiB 574 parity: 25.8 GiB 105700 stripe: 335 GiB 1374750 335 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd08 (device 8): vdl rw data buckets fragmented free: 735 GiB 3011957 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77368 107 MiB cached: 76.6 MiB 586 parity: 25.8 GiB 105657 stripe: 335 GiB 1374793 332 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd09 (device 9): vdm rw data buckets fragmented free: 735 GiB 3012001 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77328 102 MiB cached: 73.2 MiB 582 parity: 25.8 GiB 105672 stripe: 335 GiB 1374778 333 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd10 (device 10): vdn rw data buckets fragmented free: 735 GiB 3011925 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77372 108 MiB cached: 75.4 MiB 614 parity: 25.8 GiB 105678 stripe: 335 GiB 1374772 337 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd11 (device 11): vdo rw data buckets fragmented free: 735 GiB 3011920 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77357 108 MiB cached: 76.1 MiB 634 parity: 25.8 GiB 105670 stripe: 335 GiB 1374780 339 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd12 (device 12): vdc rw data buckets fragmented free: 735 GiB 3011981 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77335 102 MiB cached: 73.9 MiB 595 parity: 25.8 GiB 105735 stripe: 335 GiB 1374715 330 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
hdd.hdd13 (device 13): vdb rw data buckets fragmented free: 735 GiB 3011863 sb: 3.00 MiB 13 252 KiB journal: 2.00 GiB 8192 btree: 0 B 0 user: 18.8 GiB 77422 100.0 MiB cached: 77.0 MiB 626 parity: 25.8 GiB 105711 stripe: 335 GiB 1374739 340 MiB need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 1.09 TiB 4578566
ssd.ssd01 (device 14): vdq rw data buckets fragmented free: 7.29 GiB 29863 sb: 3.00 MiB 13 252 KiB journal: 1.46 GiB 5961 btree: 23.4 GiB 95731 user: 67.6 GiB 276783 10.1 MiB cached: 86.5 GiB 354775 parity: 0 B 0 stripe: 0 B 2 need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 186 GiB 763128
ssd.ssd02 (device 15): vdp rw data buckets fragmented free: 7.29 GiB 14930 sb: 3.00 MiB 7 508 KiB journal: 1.46 GiB 2980 btree: 22.2 GiB 72978 13.4 GiB user: 67.6 GiB 138410 19.4 MiB cached: 74.2 GiB 152258 parity: 0 B 0 stripe: 0 B 1 need_gc_gens: 0 B 0 need_discard: 0 B 0 capacity: 186 GiB 381564
bcachefs show-super
External UUID: 2353ad4f-f54a-4a6d-b838-596270f9eebc Internal UUID: d649b677-ad45-46e0-8203-4259fb360d13 Magic number: c68573f6-66ce-90a9-d96a-60cf803df7ef Device index: 0 Label: Version: 1.3: rebalance_work Version upgrade complete: 1.3: rebalance_work Oldest version on disk: 1.3: rebalance_work Created: Tue Feb 27 20:45:42 2024 Sequence number: 293 Time of last write: Wed Mar 6 12:40:27 2024 Superblock size: 16.2 KiB/1.00 MiB Clean: 0 Devices: 16 Sections: members_v1,crypt,disk_groups,clean,replicas,journal_seq_blacklist,journal_v2,counters,members_v2,errors,ext,downgrade Features: lz4,gzip,zstd,ec,journal_seq_blacklist_v3,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
Options: block_size: 512 B btree_node_size: 256 KiB errors: continue [ro] panic metadata_replicas: 2 data_replicas: 2 metadata_replicas_required: 1 data_replicas_required: 1 encoded_extent_max: 64.0 KiB metadata_checksum: none [crc32c] crc64 xxhash data_checksum: none [crc32c] crc64 xxhash compression: none background_compression: lz4:7 str_hash: crc32c crc64 [siphash] metadata_target: ssd foreground_target: ssd background_target: hdd promote_target: ssd erasure_code: 1 inodes_32bit: 1 shard_inode_numbers: 1 inodes_use_key_cache: 1 gc_reserve_percent: 8 gc_reserve_bytes: 0 B root_reserve_percent: 0 wide_macs: 0 acl: 1 usrquota: 0 grpquota: 0 prjquota: 0 journal_flush_delay: 1000 journal_flush_disabled: 0 journal_reclaim_delay: 100 journal_transaction_names: 1 version_upgrade: [compatible] incompatible none nocow: 0
members_v2 (size 2064): Device: 0 Label: hdd01 (1) UUID: 338f5d85-7d64-40c5-bf64-d0132b315a94 Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: btree,user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 1 Label: hdd02 (2) UUID: 48390588-ea0f-408c-8f45-a4cb1f5548da Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 2 Label: hdd03 (3) UUID: 9d41fd7b-e025-4d02-a27f-eb975483b1a6 Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 3 Label: hdd03 (3) UUID: 3e98488a-8dcc-4cd0-a4b7-1a6bcfd1a586 Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 4 Label: hdd04 (4) UUID: 06d8c0a7-e7e8-4308-9cd3-abcdd1e2f81e Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 5 Label: hdd05 (5) UUID: 524d274b-5fde-4e34-93da-92503887fdab Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 6 Label: hdd06 (6) UUID: e1e75a95-66f7-47c9-a060-035f1d123166 Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 7 Label: hdd07 (7) UUID: e6f48e08-33df-4cfa-b97d-cc28c37d5485 Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 8 Label: hdd08 (8) UUID: 6307e9ae-d9fd-4c55-bbfa-6620c1af19ba Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 9 Label: hdd09 (9) UUID: e31223ef-3cbf-45b8-a6b7-8ad233149f17 Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 10 Label: hdd10 (10) UUID: c5da352b-5702-4f75-b278-4ac1c6e9a489 Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 11 Label: hdd11 (11) UUID: c2ac602d-5981-44be-bd7e-459a053d3bda Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 12 Label: hdd12 (12) UUID: 669b7c16-8b12-45c6-a13d-ebe396ce4b04 Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 13 Label: hdd13 (13) UUID: 4fbad3dc-1fcf-4c7e-b9d1-357b9d692dfa Size: 1.09 TiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 4578566 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 14 Label: ssd01 (15) UUID: f3420167-2aec-4bd4-945e-a86ba21ef3a9 Size: 186 GiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 256 KiB First bucket: 0 Buckets: 763128 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: journal,btree,user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1 Device: 15 Label: ssd02 (16) UUID: 23476d10-959e-497c-905a-f41ea700f3da Size: 186 GiB read errors: 0 write errors: 0 checksum errors: 0 seqread iops: 0 seqwrite iops: 0 randread iops: 0 randwrite iops: 0 Bucket size: 512 KiB First bucket: 0 Buckets: 381564 Last mount: Sun Mar 10 16:53:57 2024 Last superblock write: 270 State: rw Data allowed: journal,btree,user Has data: journal,btree,user,cached,parity Durability: 1 Discard: 0 Freespace initialized: 1
errors (size 248): fs_usage_data_wrong 3 Wed Mar 6 12:20:52 2024 fs_usage_cached_wrong 3 Wed Mar 6 12:20:54 2024 fs_usage_replicas_wrong 17 Wed Mar 6 12:20:57 2024 dev_usage_buckets_wrong 8 Sun Mar 3 12:52:48 2024 dev_usage_sectors_wrong 25 Wed Mar 6 12:20:52 2024 dev_usage_fragmented_wrong 21 Wed Mar 6 12:20:52 2024 dev_usage_buckets_ec_wrong 16 Thu Mar 7 17:27:26 2024 alloc_key_data_type_wrong 4 Sun Mar 3 12:52:39 2024 alloc_key_dirty_sectors_wrong 9809 Wed Mar 6 12:20:45 2024 alloc_key_cached_sectors_wrong 8887 Wed Mar 6 12:20:50 2024 lru_entry_bad 4 Sun Mar 3 12:53:47 2024 ptr_to_missing_backpointer 11737 Thu Mar 7 17:30:58 2024 ptr_to_missing_replicas_entry 5 Sun Mar 3 12:49:50 2024 stale_dirty_ptr 13 Sun Mar 3 12:49:50 2024 stripe_sector_count_wrong 2476 Thu Mar 7 17:27:14 2024
dev-0/alloc_debug (HDD0)
buckets sectors fragmented
free 3002065 0 0 sb 13 6152 504 journal 8192 4194304 0 btree 4608 2359296 0 user 77129 39279660 210451 cached 6109 1951769 0 parity 105112 53817344 0 stripe 1375338 703463461 672380 need_gc_gens 0 0 0 need_discard 0 0 0 ec 1480450
reserves: stripe 143136 normal 71596 copygc 56 btree 28 btree_copygc 0 reclaim 0
freelist_wait empty open buckets allocated 36 open buckets this dev 0 open buckets total 1024 open_buckets_wait empty open_buckets_btree 2 open_buckets_user 32 buckets_to_invalidate 0 btree reserve cache 1
dev-14/alloc_debug (SSD0)
buckets sectors fragmented
free 29863 0 0 sb 13 6152 504 journal 5961 3052032 0 btree 95731 49014272 0 user 276783 141692181 20715 cached 354775 181470149 0 parity 0 0 0 stripe 2 0 0 need_gc_gens 0 0 0 need_discard 0 0 0 ec 2
reserves: stripe 23902 normal 11979 copygc 56 btree 28 btree_copygc 0 reclaim 0
freelist_wait empty open buckets allocated 36 open buckets this dev 8 open buckets total 1024 open_buckets_wait empty open_buckets_btree 2 open_buckets_user 32 buckets_to_invalidate 0 btree reserve cache 1
Kernel config bcachefs options
CONFIG_BCACHEFS_FS=m CONFIG_BCACHEFS_QUOTA=y CONFIG_BCACHEFS_ERASURE_CODING=y CONFIG_BCACHEFS_POSIX_ACL=y CONFIG_BCACHEFS_DEBUG_TRANSACTIONS=y
CONFIG_BCACHEFS_DEBUG is not set
CONFIG_BCACHEFS_TESTS is not set
CONFIG_BCACHEFS_LOCK_TIME_STATS is not set
CONFIG_BCACHEFS_NO_LATENCY_ACCT is not set
Libvirt device configuration sample
<disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none' io='native'/> <source dev='/dev/disk/by-id/scsi-SHGST_HUC101212CSS600_L0JWX1KJ' index='17'/> <backingStore/> <target dev='vdb' bus='virtio'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/> </disk>
— Reply to this email directly, view it on GitHub https://github.com/koverstreet/bcachefs/issues/658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPGX3R5VTIHXBCCYYBHSLDYXQGT3AVCNFSM6AAAAABEOYL6KOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE3TONJYHE4DGMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
It appears I don't have CONFIG_DEBUG_INFO, so I can't faddr2line right now. I'm rebuilding a new kernel with this (no other changes). I actually had restarted inbetween posting and your message so I had to induce the failure again (thankfully I can reproduce with some effort) and I can at least report that the call trace is identical to when I posted, which means the fault is identical each time.
I'll get the faddr2line the once the new kernel's up.
Okay! scripts/faddr2line ./lib/lz4/lz4hc_compress.ko LZ4HC_compress_generic+0x37f/0x1ac0
LZ4HC_compress_generic+0x37f/0x1ac0:
LZ4_copy8 at /usr/src/linux/lib/lz4/lz4defs.h:158
(inlined by) LZ4_wildCopy at /usr/src/linux/lib/lz4/lz4defs.h:180
(inlined by) LZ4HC_encodeSequence at /usr/src/linux/lib/lz4/lz4hc_compress.c:296
(inlined by) LZ4HC_compress_generic at /usr/src/linux/lib/lz4/lz4hc_compress.c:402
This looks like an LZ4 HC bug, something's off with their output buffer length checking - will have to forward it to them.
So, for the time being, sounds best to just disable lz4 or use a different compression?
Just use it with the default compression level.
I'm running a Gentoo machine on Kernel 6.7.6 (from Gentoo-sources)/1.6.4 Tools with a 2-tier bcachefs filesystem (erasure coding, 2-replicas, encryption, lz4 compression), and every now and then the bch-rebalance kernel thread on my fs crashes, which renders writing to the background target inoperable.
However, reads from all devices perfectly functional and writes to the foreground target function fine (All functions, even clearing cache to free space for incoming data). No loss in btree, journal, et al functionality, just the automated flush from foreground to background target.
I can somewhat reliably replicate this when I have multiple processes writing to the FS at one time (Multiple samba connections, lftp, et al). The circumstances tend to be when I'm pushing the write speed of the foreground SSDs to their limit (700MB/s or so per SSD), however this does not happen when one or a few processes are writing at once, only when there are many.
After this happens, any attempt to unmount the filesystem (It is not root) results in a hung umount and zombified process. The best way to resolve the issue is to reboot the VM, waiting for systemd to forcibly kill the processes and threads itself. This is the cleanest way to shut down, but even then the occasional fsck shows up some small but fixable errors.
Aside from this, I have not suffered any data loss or corruption from this (file-level checksum verified)
The machine itself is a VM running under KVM/Libvirt on a Rocky Linux 9.3 host, with each device passed through as raw block devices with the virtio bus, no caching, type raw, io native. I've included a segment of the VMs libvirt configuration xml to show how it is configured directly.
My configuration is 2 (200GB) foreground SSDs, 14 (1.2TB) background HDDs. Filesystem/Mount options:
metadata_replicas=2,data_replicas=2,background_compression=lz4:7,metadata_target=ssd,foreground_target=ssd,background_target=hdd,promote_target=ssd,erasure_code,verbose
Dmesg announcing thread crash:
bcachefs fs usage:
bcachefs show-super
dev-0/alloc_debug (HDD0)
dev-14/alloc_debug (SSD0)
Kernel config bcachefs options
Libvirt device configuration sample