koverstreet / bcachefs

Other
686 stars 71 forks source link

System hang (CPU soft lockup) when running mongoDB without snapshots (commit 0d63ed13ea3d867055ae5752e2e0514a227d1dcb ) #601

Closed bhzhu203 closed 11 months ago

bhzhu203 commented 11 months ago

System hang (CPU soft lockup) when running mongoDB without snapshots (commit 0d63ed13ea3d867055ae5752e2e0514a227d1dcb ), when I fallback to commit c8498e7253006090d0fdd680755ebfde24034fb1 , everything goes fine.

dmesg-1026.txt

bhzhu203 commented 11 months ago

After using c8498e7253006090d0fdd680755ebfde24034fb1 for a night . I can not mount the FS now , the kenrl will be panic . So I have to fallback to use d0c3d7ceb202403ee641abb9cb21ec29ef6867f0 , it is good enough.

[  178.489005] bcachefs: loading out-of-tree module taints kernel.
[  178.575597] bcachefs (vdb): mounting version 1.2: deleted_inodes opts=compression=lz4
[  178.576103] bcachefs (vdb): recovering from clean shutdown, journal seq 980904
[  178.576509] bcachefs (vdb): Version upgrade required:
[  178.576509] Doing compatible version upgrade from 1.2: deleted_inodes to 1.3: rebalance_work
[  178.576509] running recovery passes: set_fs_needs_rebalance
[  178.624247] bcachefs (vdb): alloc_read... done
[  178.627316] bcachefs (vdb): stripes_read... done
[  178.627755] bcachefs (vdb): snapshots_read... done
[  178.656155] bcachefs (vdb): journal_replay... done
[  178.656474] bcachefs (vdb): resume_logged_ops... done
[  178.656922] bcachefs (vdb): set_fs_needs_rebalance...
[  178.656968] ------------[ cut here ]------------
[  178.657550] kernel BUG at fs/bcachefs/btree_trans_commit.c:271!
[  178.657889] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  178.658173] CPU: 2 PID: 2832 Comm: mount Kdump: loaded Tainted: G           O       6.5.0-uksm+ #4 ebcf4d464669cdb2bf6530977909b37cf9e92620
[  178.658855] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 449e491 04/01/2014
[  178.659280] RIP: 0010:__bch2_trans_commit+0x584/0x1ef0 [bcachefs]
[  178.659646] Code: 48 8b 90 a0 06 00 00 8b 44 24 40 f7 d0 48 6b c0 38 8b 44 02 14 89 44 24 40 e8 88 8a 94 e0 8b 44 24 40 85 c0 0f 84 8e fd ff ff <0f> 0b 0f b6 93 96 00 00 00 41 89 c4 48 8d 0c 92 48 8d 14 4a 48 c1
[  178.660640] RSP: 0018:ffffc90000c43a20 EFLAGS: 00010282
[  178.660936] RAX: 00000000fffffffe RBX: ffff888105352b28 RCX: 0000000000000003
[  178.661326] RDX: ffff888107b84000 RSI: 00000000ffffffff RDI: ffff888107d89700
[  178.661716] RBP: ffffc90000c43ad0 R08: 0000000000000000 R09: 0000000000000000
[  178.662103] R10: ffff88810379c768 R11: 0000000000000008 R12: ffff888105350000
[  178.662492] R13: ffff888105352b28 R14: 0000000000000003 R15: 0000000000000000
[  178.662894] FS:  00007ffbaacfa800(0000) GS:ffff888237d00000(0000) knlGS:0000000000000000
[  178.663329] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  178.663650] CR2: 00007f82dc461ea0 CR3: 0000000102430003 CR4: 00000000003706e0
[  178.664037] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  178.664423] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  178.664823] Call Trace:
[  178.664970]  <TASK>
[  178.665099]  ? die+0x32/0x80
[  178.665275]  ? do_trap+0xd2/0x100
[  178.665468]  ? __bch2_trans_commit+0x584/0x1ef0 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.665979]  ? do_error_trap+0x65/0x80
[  178.666191]  ? __bch2_trans_commit+0x584/0x1ef0 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.666697]  ? exc_invalid_op+0x49/0x60
[  178.666922]  ? __bch2_trans_commit+0x584/0x1ef0 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.667426]  ? asm_exc_invalid_op+0x16/0x20
[  178.667666]  ? __bch2_trans_commit+0x584/0x1ef0 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.668168]  ? __bch2_set_rebalance_needs_scan+0x1eb/0x220 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.668743]  bch2_set_rebalance_needs_scan+0xad/0xf0 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.669550]  bch2_run_recovery_passes+0x93/0x110 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.670299]  bch2_fs_recovery+0xe46/0x1400 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.671029]  ? bch2_printbuf_exit+0x18/0x30 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.671731]  ? print_mount_opts+0x27d/0x370 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.672440]  bch2_fs_start+0x319/0x3a0 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.673124]  bch2_fs_open+0x355/0x3d0 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.673795]  bch2_mount+0x26b/0x700 [bcachefs 88dbbb92f291ea7bff49cd4de4aa987199e04b82]
[  178.674460]  legacy_get_tree+0x24/0x40
[  178.674899]  vfs_get_tree+0x1f/0xc0
[  178.675307]  path_mount+0x2b0/0xa80
[  178.675720]  __x64_sys_mount+0xe1/0x120
[  178.676145]  do_syscall_64+0x35/0x80
[  178.676570]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  178.677067] RIP: 0033:0x7ffbaab26eee
[  178.677484] Code: 48 8b 0d 45 1f 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 12 1f 0f 00 f7 d8 64 89 01 48
[  178.678904] RSP: 002b:00007ffc3525ab78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[  178.679536] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbaab26eee
[  178.680146] RDX: 00005577d6b68d80 RSI: 00005577d6b68e00 RDI: 00005577d6b68d60
[  178.680756] RBP: 00005577d6b68b30 R08: 00005577d6b68dc0 R09: 00005577d6b69af0
[  178.681357] R10: 0000000000000400 R11: 0000000000000246 R12: 0000000000000000
[  178.681959] R13: 00005577d6b68d80 R14: 00005577d6b68d60 R15: 00005577d6b68b30
[  178.682567]  </TASK>
[  178.682911] Modules linked in: bcachefs(O) mean_and_variance netconsole tcp_diag inet_diag sunrpc binfmt_misc nls_utf8 nls_cp437 intel_rapl_msr intel_rapl_common virtio_balloon joydev virtio_console serio_raw evdev squashfs loop dm_multipath dm_mod msr fuse efi_pstore ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear md_mod nvme_tcp nvme_rdma rdma_cm iw_cm ib_cm ib_core configfs nvme_fc nvme_fabrics virtio_net crct10dif_pclmul net_failover crc32_pclmul failover virtio_blk ghash_clmulni_intel sha512_ssse3 cirrus drm_shmem_helper drm_kms_helper aesni_intel crypto_simd cryptd virtio_pci virtio psmouse drm i2c_piix4 virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring i2c_core pata_acpi floppy button
bhzhu203 commented 11 months ago

After update to version 85007b0c9e18f781354ff04fa960c18463bdaa74 ,this issue solved