koverstreet / bcachefs

Other
671 stars 70 forks source link

Segmentation fault on fsck #260

Open optlink opened 3 years ago

optlink commented 3 years ago

Since running on the 5.12 rebase (up to and including 4cc1334efa08f1d4f6098c4704ee2ead044b8817) there have been multiple full system hangs requiring reboots. I am assuming this caused some damage to the filesystem resulting in it being unmountable without an fsck. Running mount -t bcachefs -o fix_errors,fsck /dev/nvme0n1p3:/dev/sda2:/dev/sdb1:/dev/sdc1:/dev/sdd1:/dev/sde1 /newroot results in this:

[  130.675121] bcachefs (1301beb7-6380-4a5a-b16b-1784c4da140d): journal read done, 5454334 keys in 2252 entries, seq 1623737
[  136.514830] bcachefs (1301beb7-6380-4a5a-b16b-1784c4da140d): error validating btree node on sda2 at btree alloc level 0/1
[  136.514830]   u64s 12 type btree_ptr_v2 2:912531:0 len 0 ver 0: seq c1ae1547e0bd94f9 written 0 min_key 2:906082:1 ptr: 5:1585034240 gen 1 ptr: 0:367327744 gen 1
[  136.514830]   node offset 392: found bset signature after last bset
[  136.514833] bcachefs (1301beb7-6380-4a5a-b16b-1784c4da140d): retrying read
[  137.115457] bcachefs (1301beb7-6380-4a5a-b16b-1784c4da140d): starting mark and sweep
[  137.191070] bcachefs (1301beb7-6380-4a5a-b16b-1784c4da140d): bucket 1:8381 data type user stale dirty ptr: 16 < 17
[  137.191070] while marking u64s 16 type stripe 0:2:0 len 0 ver 0: algo 0 sectors 512 blocks 2:1 csum 5 gran 128 1:4291072:0 4:12030976:8 2:121867264:0, fixing
[  137.191079] ------------[ cut here ]------------
[  137.191080] kernel BUG at fs/bcachefs/extents.h:553!
[  137.191085] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  137.191086] CPU: 4 PID: 312 Comm: mount Not tainted 5.12.11-gentoo #1
[  137.191088] Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VII HERO, BIOS 3103 06/17/2020
[  137.191089] RIP: 0010:bch2_check_fix_ptrs+0xb63/0x1040
[  137.191093] Code: 41 0f b6 45 2b 4d 8d 45 30 49 8d 74 c7 08 e9 13 fc ff ff 45 31 c0 31 f6 e9 09 fc ff ff 4c 89 ef e8 02 76 d1 ff e9 c1 fa ff ff <0f> 0b 41 0f b6 45 2b 49 8d 4d 30 49 8d 74 c7 08 e9 ee f7 ff ff 41
[  137.191094] RSP: 0018:ffff9ced83f3b520 EFLAGS: 00010287
[  137.191096] RAX: 000000000000000e RBX: 00000fffffffffff RCX: 000000000000000e
[  137.191097] RDX: ffff9cedab0c37c0 RSI: ffff8e13840727c8 RDI: ffff8e138a6000c0
[  137.191098] RBP: ffff9ced83f3b6f0 R08: ffff8e13840727b0 R09: ffff8e138a6000b8
[  137.191098] R10: ffff9cedaf2258e0 R11: ffff9ced83f3b788 R12: ffff9ced83f3b788
[  137.191099] R13: ffff8e1384072780 R14: 0000000000000006 R15: ffff8e13840727a8
[  137.191100] FS:  00007f07b3ef3818(0000) GS:ffff8e1a7ef00000(0000) knlGS:0000000000000000
[  137.191101] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.191102] CR2: 0000555556d2a058 CR3: 00000001096ae000 CR4: 00000000003506e0
[  137.191103] Call Trace:
[  137.191105]  ? newidle_balance.constprop.0+0xba/0x380
[  137.191109]  ? dequeue_entity+0xc3/0x340
[  137.191110]  ? bch2_btree_node_iter_peek_unpack+0x212/0x2e0
[  137.191112]  ? bch2_gc_mark_key+0x27/0x2b0
[  137.191113]  bch2_gc_mark_key+0x27/0x2b0
[  137.191114]  bch2_gc_btree_init_recurse+0x175/0x5d0
[  137.191116]  ? sysvec_apic_timer_interrupt+0xb/0x90
[  137.191119]  ? bch2_mark_dev_superblock+0x21b/0x380
[  137.191120]  ? rcuwait_wake_up+0x29/0x30
[  137.191122]  ? bch2_mark_dev_superblock+0x35b/0x380
[  137.191124]  bch2_gc+0xd32/0x1920
[  137.191126]  ? vprintk_default+0x69/0x1b0
[  137.191127]  ? printk+0x53/0x6a
[  137.191129]  bch2_fs_recovery.cold+0x483/0x718
[  137.191131]  ? __copy_super+0x1db/0x220
[  137.191133]  ? bch2_recalc_capacity+0x1b5/0x2c0
[  137.191135]  bch2_fs_start+0x206/0x550
[  137.191136]  bch2_fs_open+0x30a/0x430
[  137.191138]  ? free_percpu+0x15e/0x2f0
[  137.191141]  bch2_mount+0x46b/0x5b0
[  137.191144]  legacy_get_tree+0x22/0x40
[  137.191146]  vfs_get_tree+0x20/0xb0
[  137.191148]  path_mount+0x558/0xdb0
[  137.191150]  __x64_sys_mount+0xfe/0x140
[  137.191152]  do_syscall_64+0x33/0x40
[  137.191153]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  137.191155] RIP: 0033:0x7f07b3ea3862
[  137.191156] Code: e8 41 fc ff ff 5a c3 48 63 ff 50 48 63 d2 b8 67 00 00 00 0f 05 48 89 c7 e8 29 fc ff ff 5a c3 49 89 ca 50 b8 a5 00 00 00 0f 05 <48> 89 c7 e8 14 fc ff ff 5a c3 48 63 f6 50 b8 a6 00 00 00 0f 05 48
[  137.191157] RSP: 002b:00007ffcf2b602b0 EFLAGS: 00000212 ORIG_RAX: 00000000000000a5
[  137.191158] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 00007f07b3ea3862
[  137.191159] RDX: 00007ffcf2b60dde RSI: 00007ffcf2b60e3b RDI: 00007ffcf2b60dfa
[  137.191160] RBP: 00007ffcf2b60dfa R08: 00005555559aa040 R09: 8080808080808080
[  137.191161] R10: 0000000000008000 R11: 0000000000000212 R12: 00000000ffffffff
[  137.191161] R13: 00007ffcf2b60e3b R14: 00007ffcf2b60dde R15: 00005555559aa040
[  137.191162] Modules linked in:
[  137.191164] ---[ end trace 8d21b5dd32f9ebe2 ]---
[  137.191165] RIP: 0010:bch2_check_fix_ptrs+0xb63/0x1040
[  137.191167] Code: 41 0f b6 45 2b 4d 8d 45 30 49 8d 74 c7 08 e9 13 fc ff ff 45 31 c0 31 f6 e9 09 fc ff ff 4c 89 ef e8 02 76 d1 ff e9 c1 fa ff ff <0f> 0b 41 0f b6 45 2b 49 8d 4d 30 49 8d 74 c7 08 e9 ee f7 ff ff 41
[  137.191167] RSP: 0018:ffff9ced83f3b520 EFLAGS: 00010287
[  137.191168] RAX: 000000000000000e RBX: 00000fffffffffff RCX: 000000000000000e
[  137.191169] RDX: ffff9cedab0c37c0 RSI: ffff8e13840727c8 RDI: ffff8e138a6000c0
[  137.191170] RBP: ffff9ced83f3b6f0 R08: ffff8e13840727b0 R09: ffff8e138a6000b8
[  137.191170] R10: ffff9cedaf2258e0 R11: ffff9ced83f3b788 R12: ffff9ced83f3b788
[  137.191171] R13: ffff8e1384072780 R14: 0000000000000006 R15: ffff8e13840727a8
[  137.191172] FS:  00007f07b3ef3818(0000) GS:ffff8e1a7ef00000(0000) knlGS:0000000000000000
[  137.191172] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.191173] CR2: 0000555556d2a058 CR3: 00000001096ae000 CR4: 00000000003506e0
optlink commented 3 years ago

Result of running the latest bcachefs-tools fsck:

$ sudo ./bcachefs fsck /dev/sda2 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/nvme0n1p3
journal read done, 5454334 keys in 2252 entries, seq 1623737
starting mark and sweep
bucket 1:8381 data type user stale dirty ptr: 16 < 17
while marking u64s 16 type stripe 0:2:0 len 0 ver 0: algo 0 sectors 512 blocks 2:1 csum 5 gran 128 1:4291072:0 4:12030976:8 2:121867264:0: fix? (y,n) y
bcachefs: libbcachefs/extents.h:542: bch2_bkey_ptr_data_type: Assertion `!(ptr < s.v->ptrs || ptr >= s.v->ptrs + s.v->nr_blocks)' failed.
Aborted