Closed RAOF closed 5 years ago
Ok; it's possible to hit this multiple ways. I've just got this on a fresh filesystem, during use:
[ 110.713039] ------------[ cut here ]------------
[ 110.713040] kernel BUG at fs/bcachefs/buckets.c:1258!
[ 110.713045] invalid opcode: 0000 [#1] SMP PTI
[ 110.713047] CPU: 3 PID: 2570 Comm: PK-Backend Not tainted 5.0.0+bcachefs.git20190423.990250b2-1-generic #1-Ubuntu
[ 110.713048] Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 1904 07/05/2016
[ 110.713053] RIP: 0010:bch2_trans_mark_extent+0x3b4/0x440
[ 110.713054] Code: 8b 85 10 ff ff ff 48 8b 50 58 48 8b bd 10 ff ff ff 4a 8d 44 02 0b 4c 8d bf 60 01 00 00 48 89 47 58 4c 39 f8 0f 86 89 fe ff ff <0f> 0b 0f 0b 0f 0b 48 8b bd 4a ff ff ff 48 89 7a 08 89 c7 4c 8b 4c
[ 110.713055] RSP: 0018:ffffc164c98c71b8 EFLAGS: 00010212
[ 110.713057] RAX: ffffc164c98c7890 RBX: 0000000000000000 RCX: ffffc164c98c71fa
[ 110.713057] RDX: ffffc164c98c7884 RSI: ffffc164c98c788c RDI: ffffc164c98c7728
[ 110.713058] RBP: ffffc164c98c72b0 R08: 0000000000000001 R09: 0000000000000001
[ 110.713059] R10: 0000000000000018 R11: 0000000000000020 R12: ffff9fa4fbec0e50
[ 110.713060] R13: fffffffffffffff0 R14: 0000000000000000 R15: ffffc164c98c7888
[ 110.713061] FS: 00007fbb98be5700(0000) GS:ffff9fa526ac0000(0000) knlGS:0000000000000000
[ 110.713062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 110.713063] CR2: 00007fbb7c2cb050 CR3: 00000008083b8003 CR4: 00000000003606e0
[ 110.713064] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 110.713065] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 110.713065] Call Trace:
[ 110.713069] bch2_trans_mark_key+0x97/0xb0
[ 110.713071] bch2_trans_mark_update+0x1d2/0x370
[ 110.713073] do_btree_insert_at+0x3ed/0xd30
[ 110.713075] ? bch2_extent_trim_atomic+0x1e3/0x2d0
[ 110.713077] bch2_trans_commit+0x28d/0x8f0
[ 110.713079] bch2_btree_delete_range+0x1e9/0x296
[ 110.713082] ? __bch2_btree_iter_traverse+0x16b/0x6a0
[ 110.713084] ? bch2_btree_node_iter_sort+0x177/0x200
[ 110.713086] ? bch2_btree_node_iter_init+0x22f/0x5c0
[ 110.713089] ? bch2_btree_node_iter_sort+0x177/0x200
[ 110.713090] ? __bch2_bkey_cmp_left_packed+0x40/0xb0
[ 110.713094] bch2_inode_rm+0xb8/0x3b0
[ 110.713095] ? bch2_trans_exit+0x38/0x80
[ 110.713101] ? inode_permission+0x63/0x1a0
[ 110.713102] ? path_parentat.isra.43+0x3f/0x80
[ 110.713104] ? filename_parentat.isra.58.part.59+0xf7/0x180
[ 110.713106] ? list_lru_add+0x6c/0x190
[ 110.713108] bch2_evict_inode+0xcf/0xf0
[ 110.713110] evict+0xca/0x1a0
[ 110.713111] iput+0x148/0x210
[ 110.713113] do_unlinkat+0x248/0x2e0
[ 110.713115] __x64_sys_unlink+0x23/0x30
[ 110.713117] do_syscall_64+0x5a/0x110
[ 110.713119] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 110.713121] RIP: 0033:0x7fbb9ce1aac7
[ 110.713122] Code: f0 ff ff 73 01 c3 48 8b 0d c6 53 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 53 0d 00 f7 d8 64 89 01 48
[ 110.713123] RSP: 002b:00007fbb98be4aa8 EFLAGS: 00000203 ORIG_RAX: 0000000000000057
[ 110.713124] RAX: ffffffffffffffda RBX: 00007fbb98be4b30 RCX: 00007fbb9ce1aac7
[ 110.713125] RDX: 0000000000000064 RSI: 00007fbb9a01c527 RDI: 00007fbb7c007bc0
[ 110.713125] RBP: 00007fbb98be55f8 R08: 00007fbb7c256c30 R09: 0000000000000004
[ 110.713126] R10: 0000000000000024 R11: 0000000000000203 R12: 00007fbb9a0203fd
[ 110.713127] R13: 00007ffd0c4bbdbf R14: 00007ffd0c4bbe50 R15: 00007fbb98be4e00
[ 110.713128] Modules linked in: nls_iso8859_1 hid_multitouch amdgpu chash amd_iommu_v2 gpu_sched intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi radeon snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep nouveau crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi ttm drm_kms_helper aesni_intel drm snd_seq aes_x86_64 snd_seq_device snd_timer eeepc_wmi snd crypto_simd i2c_algo_bit fb_sys_fops cryptd syscopyarea glue_helper intel_cstate joydev asus_wmi mei_me sysfillrect intel_rapl_perf input_leds sparse_keymap serio_raw wmi_bmof mxm_wmi sysimgblt soundcore mei acpi_pad mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid uas usb_storage nvme psmouse e1000e ahci nvme_core i2c_i801 libahci wmi video
[ 110.713152] ---[ end trace 427a4d8ece0d5586 ]---
[ 110.713154] RIP: 0010:bch2_trans_mark_extent+0x3b4/0x440
[ 110.713155] Code: 8b 85 10 ff ff ff 48 8b 50 58 48 8b bd 10 ff ff ff 4a 8d 44 02 0b 4c 8d bf 60 01 00 00 48 89 47 58 4c 39 f8 0f 86 89 fe ff ff <0f> 0b 0f 0b 0f 0b 48 8b bd 4a ff ff ff 48 89 7a 08 89 c7 4c 8b 4c
[ 110.713156] RSP: 0018:ffffc164c98c71b8 EFLAGS: 00010212
[ 110.713157] RAX: ffffc164c98c7890 RBX: 0000000000000000 RCX: ffffc164c98c71fa
[ 110.713158] RDX: ffffc164c98c7884 RSI: ffffc164c98c788c RDI: ffffc164c98c7728
[ 110.713158] RBP: ffffc164c98c72b0 R08: 0000000000000001 R09: 0000000000000001
[ 110.713159] R10: 0000000000000018 R11: 0000000000000020 R12: ffff9fa4fbec0e50
[ 110.713160] R13: fffffffffffffff0 R14: 0000000000000000 R15: ffffc164c98c7888
[ 110.713161] FS: 00007fbb98be5700(0000) GS:ffff9fa526ac0000(0000) knlGS:0000000000000000
[ 110.713162] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 110.713162] CR2: 00007fbb7c2cb050 CR3: 00000008083b8003 CR4: 00000000003606e0
[ 110.713163] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 110.713164] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
An older (2019/04/10) kernel appears to handle the filesystem fine, so it's not a data loss bug.
I can't dump the fs with current bcachefs-tools, as that hits am assertion when trying to; if necessary I could presumably build an older bcachefs-tools and dump with that.
Fixed by e3ddc90cc4 - I'll pull that into tools too
Hm. I'm still seeing this with a kernel at commit 0b1aaaf3
, which contains e3ddc90
. Is this expected to have caused on-disc structures to corrupt, or am I hitting a different problem?
Annoyingly, I seem to have crossed a filesystem feature compatibility threshold; my older kernel refuses to attempt to mount the fs with “incompatible features detected”.
[ 83.649982] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): journal read done, 106779 keys in 23 entries, seq 4491807
[ 83.760522] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): journal entries 4491786-4491786 missing! (replaying 4491780-4491807), fixing
[ 83.761823] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): journal entries 4491791-4491791 missing! (replaying 4491780-4491807), fixing
[ 83.763090] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): journal entries 4491796-4491796 missing! (replaying 4491780-4491807), fixing
[ 83.764362] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): journal entries 4491801-4491801 missing! (replaying 4491780-4491807), fixing
[ 83.765652] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): journal entries 4491806-4491806 missing! (replaying 4491780-4491807), fixing
[ 83.772633] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): starting alloc read
[ 86.802997] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): alloc read done
[ 86.802998] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): starting stripes_read
[ 86.803195] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): stripes_read done
[ 86.803195] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): starting mark and sweep
[ 95.524898] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): mark and sweep done
[ 95.524899] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): starting journal replay
[ 96.800329] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): journal replay done
[ 96.800330] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): writing allocation info
[ 97.495182] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): alloc write done
[ 97.495183] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): checking for deleted inodes
[ 97.516035] bcachefs (d50f806d-78a0-40dd-9ae1-33c21762c56e): deleting inode 1085447
[ 97.516055] ------------[ cut here ]------------
[ 97.516055] kernel BUG at fs/bcachefs/buckets.c:1258!
[ 97.517332] invalid opcode: 0000 [#1] SMP PTI
[ 97.518567] CPU: 6 PID: 350 Comm: exe Not tainted 5.0.0+bcachefs.git20190508.0b1aaaf3-1-generic #1-Ubuntu
[ 97.519815] Hardware name: System76 Oryx Pro/ Oryx Pro, BIOS 1.05.02dRSA2 02/20/2017
[ 97.521112] RIP: 0010:bch2_trans_mark_extent+0x3b4/0x440
[ 97.522370] Code: 8b 85 10 ff ff ff 48 8b 50 58 48 8b bd 10 ff ff ff 4a 8d 44 02 0b 4c 8d bf 60 01 00 00 48 89 47 58 4c 39 f8 0f 86 89 fe ff ff <0f> 0b 0f 0b 0f 0b 48 8b bd 4a ff ff ff 48 89 7a 08 89 c7 4c 8b 4c
[ 97.523676] RSP: 0018:ffffa86503896798 EFLAGS: 00010202
[ 97.525031] RAX: ffffa86503896e78 RBX: 0000000000000000 RCX: ffffa865038967da
[ 97.526562] RDX: ffffa86503896e6c RSI: ffffa86503896e74 RDI: ffffa86503896d10
[ 97.527963] RBP: ffffa86503896890 R08: 0000000000000001 R09: 0000000000000005
[ 97.529346] R10: 0000000000000008 R11: 0000000000000020 R12: ffff9181d8d80b20
[ 97.530731] R13: fffffffffffffff8 R14: 0000000000000000 R15: ffffa86503896e70
[ 97.532122] FS: 00007efff85d85c0(0000) GS:ffff918262180000(0000) knlGS:0000000000000000
[ 97.533518] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 97.534915] CR2: 0000562ab212c000 CR3: 0000000893c56003 CR4: 00000000003606e0
[ 97.536334] Call Trace:
[ 97.537779] ? bch2_btree_iter_relock.part.26+0x273/0x3c0
[ 97.539213] bch2_trans_mark_key+0x97/0xb0
[ 97.540654] bch2_trans_mark_update+0x1d2/0x370
[ 97.542074] do_btree_insert_at+0x3ed/0xd40
[ 97.543515] ? bch2_extent_trim_atomic+0x1e3/0x2d0
[ 97.544954] bch2_trans_commit+0x27c/0x960
[ 97.546389] bch2_btree_delete_range+0x1e9/0x296
[ 97.547830] ? bch2_btree_delete_range+0x4d/0x296
[ 97.549267] ? chacha_stream_xor+0x176/0x1f0
[ 97.550689] ? __update_load_avg_cfs_rq+0x1b5/0x230
[ 97.552119] ? update_load_avg+0x8b/0x590
[ 97.553551] ? update_curr+0xf2/0x1e0
[ 97.554968] ? sched_clock+0x9/0x10
[ 97.556399] ? check_preempt_wakeup+0x1a0/0x250
[ 97.557830] bch2_inode_rm+0xb8/0x3b0
[ 97.559246] ? bch2_replicas_entry_idx+0x8a/0xb0
[ 97.560663] ? bch2_mark_extent+0x924/0x9e0
[ 97.562078] ? syscall_return_via_sysret+0x10/0x7f
[ 97.563501] ? __switch_to_asm+0x34/0x70
[ 97.564906] ? bch2_btree_node_iter_sort+0x177/0x200
[ 97.566314] ? __switch_to_asm+0x34/0x70
[ 97.567698] ? __switch_to_asm+0x40/0x70
[ 97.569084] ? __bch2_bkey_cmp_left_packed+0x40/0xb0
[ 97.570473] ? bch2_btree_node_iter_push+0x70/0x80
[ 97.571863] ? __bch2_btree_node_iter_fix.isra.23+0x16f/0x670
[ 97.573249] ? bch2_bset_fix_lookup_table+0x104/0x3c0
[ 97.574636] ? bch2_btree_node_iter_init+0x22f/0x5c0
[ 97.576068] ? bch2_btree_node_iter_sort+0x177/0x200
[ 97.577464] ? put_dec+0x18/0xa0
[ 97.578843] ? number+0x31f/0x360
[ 97.580229] ? sched_clock+0x9/0x10
[ 97.581651] ? sched_clock_cpu+0x11/0xc0
[ 97.583045] ? log_store+0x1ff/0x280
[ 97.584446] ? up+0x32/0x50
[ 97.585827] ? down_trylock+0x2e/0x40
[ 97.587229] ? vprintk_emit+0x211/0x270
[ 97.588641] ? vprintk_default+0x29/0x50
[ 97.590014] ? vprintk_func+0x47/0xbc
[ 97.591373] ? printk+0x58/0x6f
[ 97.592760] check_inode+0x2cc/0x660
[ 97.594120] ? __bch2_bkey_cmp_left_packed+0x40/0xb0
[ 97.595476] ? __bch2_btree_iter_traverse+0x16b/0x6a0
[ 97.596869] ? apic_timer_interrupt+0xa/0x20
[ 97.598264] ? apic_timer_interrupt+0xa/0x20
[ 97.599614] ? bch2_btree_iter_traverse+0x12/0x30
[ 97.601014] ? bch2_btree_iter_peek+0x111/0x270
[ 97.602350] bch2_fsck_walk_inodes_only+0x17e/0x198
[ 97.603693] ? bch2_fsck_walk_inodes_only+0x33/0x198
[ 97.605068] ? check_preempt_curr+0x68/0x90
[ 97.606399] ? attach_task+0x47/0x50
[ 97.607718] ? __switch_to_asm+0x40/0x70
[ 97.609074] ? __switch_to_asm+0x34/0x70
[ 97.610393] ? __switch_to_asm+0x40/0x70
[ 97.611701] ? __switch_to_asm+0x40/0x70
[ 97.613040] ? __switch_to_asm+0x34/0x70
[ 97.614334] ? __switch_to_asm+0x40/0x70
[ 97.615626] ? __switch_to_asm+0x34/0x70
[ 97.616927] ? __switch_to_asm+0x40/0x70
[ 97.618190] ? __switch_to_asm+0x34/0x70
[ 97.619452] ? sched_clock+0x9/0x10
[ 97.620740] ? sched_clock_cpu+0x11/0xc0
[ 97.621989] ? log_store+0x1ff/0x280
[ 97.623231] ? up+0x32/0x50
[ 97.624507] ? down_trylock+0x2e/0x40
[ 97.625754] ? vprintk_emit+0x211/0x270
[ 97.626984] ? vprintk_default+0x29/0x50
[ 97.628213] ? vprintk_func+0x47/0xbc
[ 97.629472] ? printk+0x58/0x6f
[ 97.630695] bch2_fs_recovery+0xdcc/0xffb
[ 97.631914] bch2_fs_start+0x185/0x490
[ 97.633175] ? bch2_fs_start+0x185/0x490
[ 97.634377] bch2_fs_open+0x22c/0x290
[ 97.635593] bch2_mount+0x28c/0x680
[ 97.636835] mount_fs+0x51/0x165
[ 97.638045] vfs_kern_mount.part.38+0x5d/0x110
[ 97.639254] do_mount+0x22f/0xd50
[ 97.640482] ksys_mount+0xb6/0xd0
[ 97.641685] __x64_sys_mount+0x25/0x30
[ 97.642872] do_syscall_64+0x5a/0x110
[ 97.644062] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 97.645285] RIP: 0033:0x7efff850a63a
[ 97.646468] Code: 48 8b 0d 59 58 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 26 58 0c 00 f7 d8 64 89 01 48
[ 97.647700] RSP: 002b:00007ffcb5fdc498 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[ 97.648962] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 00007efff850a63a
[ 97.650201] RDX: 00007ffcb5fddda3 RSI: 00007ffcb5fddde3 RDI: 00007ffcb5fdddc2
[ 97.651430] RBP: 00007efff85d8540 R08: 0000562ab212b2a0 R09: 0073726f7272655f
[ 97.652712] R10: 0000000000008000 R11: 0000000000000206 R12: 0000562ab212b2a0
[ 97.653942] R13: 0000000000000000 R14: 00007ffcb5fdc708 R15: 0000000000000000
[ 97.655192] Modules linked in: uas usb_storage rtsx_pci_sdmmc psmouse nvme r8169 i2c_i801 nvme_core ahci rtsx_pci realtek libahci wmi video
[ 97.656498] ---[ end trace 5b2b52db0f6b29b3 ]---
[ 97.657768] RIP: 0010:bch2_trans_mark_extent+0x3b4/0x440
[ 97.659003] Code: 8b 85 10 ff ff ff 48 8b 50 58 48 8b bd 10 ff ff ff 4a 8d 44 02 0b 4c 8d bf 60 01 00 00 48 89 47 58 4c 39 f8 0f 86 89 fe ff ff <0f> 0b 0f 0b 0f 0b 48 8b bd 4a ff ff ff 48 89 7a 08 89 c7 4c 8b 4c
[ 97.660271] RSP: 0018:ffffa86503896798 EFLAGS: 00010202
[ 97.661545] RAX: ffffa86503896e78 RBX: 0000000000000000 RCX: ffffa865038967da
[ 97.662785] RDX: ffffa86503896e6c RSI: ffffa86503896e74 RDI: ffffa86503896d10
[ 97.664016] RBP: ffffa86503896890 R08: 0000000000000001 R09: 0000000000000005
[ 97.665293] R10: 0000000000000008 R11: 0000000000000020 R12: ffff9181d8d80b20
[ 97.666522] R13: fffffffffffffff8 R14: 0000000000000000 R15: ffffa86503896e70
[ 97.667744] FS: 00007efff85d85c0(0000) GS:ffff918262180000(0000) knlGS:0000000000000000
[ 97.669016] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 97.670249] CR2: 0000562ab212c000 CR3: 0000000893c56003 CR4: 00000000003606e0
I've got a broken filesystem that was happily ticking along at commit
1f342977
but that seems to have been broken by990250b2
.Trying to mount the filesystem reproducibly dies with this backtrace (the first time I got this backtrace it was not a mount with
-o fix_errors
, but subsequent mount attempts have required-o fix_errors
to get past the initial fsck errors.The filesystem in question is encrypted (as can be seen from the backtrace) and tiered across a SSD (
foreground_target
andpromote_target
) and 2 HDDs (background_target
), and uses lz4 compression.