koverstreet / bcachefs

Other
695 stars 72 forks source link

kernel panic when deleting files and snashots at the same time #367

Open b-r-o-w-n opened 2 years ago

b-r-o-w-n commented 2 years ago

I was testing and made some snapshots. Then I discovered that dis not do it correctly.... my fault.

I wanted to start over... so I deleted all the files in the subvolume. It was kind slow.. so I opened another xterm and deleted all the stuff under the .snap directory.... you know.. to make it go faster!

It seemed to work for a while.....

[ 6894.930210] Kernel panic - not syncing: trans path oveflow [ 6894.935770] CPU: 6 PID: 3298 Comm: rm Not tainted 5.15.0-bcachefs+ #4 [ 6894.942268] Hardware name: System manufacturer System Product Name/P9X79 LE, BIOS 4801 07/24/2014 [ 6894.951216] Call Trace: [ 6894.953706] dump_stack_lvl+0x34/0x44 [ 6894.957447] panic+0xe8/0x2ae [ 6894.960463] ? bch2_dump_trans_paths_updates+0x1f9/0x21c [ 6894.965856] btree_path_alloc.cold+0x11/0x11 [ 6894.970203] btree_path_clone+0x1c/0x110 [ 6894.974197] bch2_btree_path_set_pos+0x2f5/0x6a0 [ 6894.978888] bch2_btree_iter_peek+0x49e/0xa70 [ 6894.983317] bch2_inode_delete_keys.isra.0+0x126/0x280 [ 6894.988518] bch2_inode_rm+0xee/0x2a0 [ 6894.992268] ? call_rcu+0x90/0x310 [ 6894.995733] ? free_unref_page_list+0x161/0x1e0 [ 6895.000327] ? xas_load+0x5/0x50 [ 6895.003618] ? xas_find+0x14b/0x180 [ 6895.007174] ? xas_load+0x5/0x50 [ 6895.010466] ? xas_find+0x14b/0x180 [ 6895.014028] ? find_get_entries+0xfe/0x150 [ 6895.018193] evict+0xba/0x160 [ 6895.021222] do_unlinkat+0x1b7/0x2b0 [ 6895.024880] __x64_sys_unlinkat+0x2e/0x50 [ 6895.028954] do_syscall_64+0x3b/0x90 [ 6895.032611] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 6895.037744] RIP: 0033:0x7f6b52192837 [ 6895.041391] Code: 73 01 c3 48 8b 0d f9 f5 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 f5 0e 00 f7 d8 64 89 01 48 [ 6895.060288] RSP: 002b:00007fffd8858d08 EFLAGS: 00000246 ORIG_RAX: 0000000000000107 [ 6895.067929] RAX: ffffffffffffffda RBX: 0000562c79134f70 RCX: 00007f6b52192837 [ 6895.075144] RDX: 0000000000000000 RSI: 0000562c79135070 RDI: 0000000000000003 [ 6895.082362] RBP: 0000562c791334e0 R08: 0000000000000003 R09: 0000000000000000 [ 6895.089562] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 6895.096766] R13: 00007fffd8858ef0 R14: 0000000000000000 R15: 0000562c79134f70 [ 6895.104002] Kernel Offset: 0x4400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 6895.114839] ---[ end Kernel panic - not syncing: trans path oveflow ]---

I was using the rm -rf command.

bcachefs tool version v0.1-470-gfd1b849 git rev-parse --short HEAD ffad51ba45f8

b-r-o-w-n commented 2 years ago

I believe this is a duplicate of bug #365.

After resetting and fsck and mounting I discovered that there were undeleted files. So I deleted them....

[ 985.027697] Kernel panic - not syncing: trans path oveflow [ 985.033255] CPU: 10 PID: 2961 Comm: rm Not tainted 5.15.0-bcachefs+ #4 [ 985.039829] Hardware name: System manufacturer System Product Name/P9X79 LE, BIOS 4801 07/24/2014 [ 985.048743] Call Trace: [ 985.051248] dump_stack_lvl+0x34/0x44 [ 985.054990] panic+0xe8/0x2ae [ 985.058015] ? bch2_dump_trans_paths_updates+0x1f9/0x21c [ 985.063405] btree_path_alloc.cold+0x11/0x11 [ 985.067746] btree_path_clone+0x1c/0x110 [ 985.071731] bch2_btree_path_set_pos+0x2f5/0x6a0 [ 985.076415] bch2_btree_iter_peek+0x49e/0xa70 [ 985.080834] bch2_btree_iter_peek_slot+0x361/0x5f0 [ 985.085715] bch2_unlink_trans+0x3c2/0x610 [ 985.089893] ? bch2_unlink+0x146/0x270 [ 985.093884] __bch2_unlink+0x146/0x270 [ 985.097703] vfs_rmdir+0x77/0x190 [ 985.101084] do_rmdir+0x137/0x1a0 [ 985.104473] x64_sys_unlinkat+0x41/0x50 [ 985.108548] do_syscall_64+0x3b/0x90 [ 985.112188] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 985.117330] RIP: 0033:0x7fae45877837 [ 985.120968] Code: 73 01 c3 48 8b 0d f9 f5 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 f5 0e 00 f7 d8 64 89 01 48 [ 985.139891] RSP: 002b:00007ffcd040acf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000107 [ 985.147540] RAX: ffffffffffffffda RBX: 000056164db59790 RCX: 00007fae45877837 [ 985.154755] RDX: 0000000000000200 RSI: 000056164db58570 RDI: 00000000ffffff9c [ 985.161965] RBP: 000056164db584e0 R08: 0000000000000003 R09: 0000000000000000 [ 985.169183] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 985.176395] R13: 00007ffcd040aee0 R14: 0000000000000001 R15: 000056164db59790 [ 985.183676] Kernel Offset: 0xa00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 985.194380] ---[ end Kernel panic - not syncing: trans path oveflow ]---

So, fsck is not fixing the corruption and trying to delete the remaining space panic's again.....

koverstreet commented 2 years ago

This should be fixed - can you confirm?

b-r-o-w-n commented 2 years ago

Hi, No its not... or it has changed. I tried to respond to defect/Issue #365 but I guess it never got to you..

[ 3292.032563] bcachefs: bch2_fs_open() bch_fs_open err opening /dev/sdb: device not a member of filesystem [ 3303.947270] bcachefs (5ff824f8-d7df-40a6-b13c-ea339dd946c0): recovering from clean shutdown, journal seq 2106 [ 3304.213830] bcachefs (5ff824f8-d7df-40a6-b13c-ea339dd946c0): going read-write [ 3304.331684] bcachefs (5ff824f8-d7df-40a6-b13c-ea339dd946c0): mounted with opts: metadata_replicas=2,data_replicas=2,compression=lz4,noinodes_use_key_cache [ 3315.913986] bcachefs (sdd): recovering from clean shutdown, journal seq 48752 [ 3317.195145] bcachefs (sdd): going read-write [ 3317.240359] bcachefs (sdd): mounted with opts: metadata_checksum=xxhash,data_checksum=xxhash,compression=zstd,noinodes_use_key_cache [ 3839.730233] ------------[ cut here ]------------ [ 3839.730240] kernel BUG at fs/bcachefs/btree_iter.c:2397! [ 3839.730252] invalid opcode: 0000 [#1] SMP PTI [ 3839.734752] CPU: 5 PID: 3063 Comm: rm Not tainted 5.15.0-g1623e9cede9e+ #17 [ 3839.741894] Hardware name: System manufacturer System Product Name/P9X79 LE, BIOS 4801 07/24/2014 [ 3839.751200] RIP: 0010:bch2_btree_iter_peek+0x9ed/0xa70 [ 3839.756644] Code: 03 41 80 69 02 01 75 18 4c 89 ce 89 45 a0 e8 0a 9f ff ff 41 0f b7 77 22 8b 45 a0 e9 df fc ff ff 41 0f b7 77 22 e9 d5 fc ff ff <0f> 0b 48 8b 38 89 f0 66 d1 e8 83 e0 01 41 28 41 03 41 80 69 02 01 [ 3839.775702] RSP: 0018:ffffa1a3817b77b0 EFLAGS: 00010246 [ 3839.781027] RAX: ffff8b8bc0cd4300 RBX: ffffa1a3817b78b0 RCX: 0000000000000000 [ 3839.788345] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [ 3839.795625] RBP: ffffa1a3817b7830 R08: 0000000000000001 R09: 000000002800001d [ 3839.802923] R10: ffff8b8987327400 R11: 0000000000000001 R12: 000000000000203a [ 3839.810236] R13: 728db65393926211 R14: ffff8b8bec2400b0 R15: ffffa1a3817b7860 [ 3839.817524] FS: 00007f8d00ff7740(0000) GS:ffff8b947fb40000(0000) knlGS:0000000000000000 [ 3839.825815] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3839.831743] CR2: 000055727d170098 CR3: 00000002b422c002 CR4: 00000000001706e0 [ 3839.839008] Call Trace: [ 3839.841587] bch2_btree_iter_peek_slot+0x361/0x5f0 [ 3839.846530] bch2_unlink_trans+0x3c2/0x610 [ 3839.850728] ? bch2_unlink+0x146/0x270 [ 3839.854786] __bch2_unlink+0x146/0x270 [ 3839.858623] vfs_unlink+0x10e/0x220 [ 3839.862274] do_unlinkat+0x18f/0x2b0 [ 3839.865976] x64_sys_unlinkat+0x2e/0x50 [ 3839.870059] do_syscall_64+0x3b/0x90 [ 3839.873749] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3839.878967] RIP: 0033:0x7f8d010f4837 [ 3839.882648] Code: 73 01 c3 48 8b 0d f9 f5 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 f5 0e 00 f7 d8 64 89 01 48 [ 3839.901722] RSP: 002b:00007ffe2f400308 EFLAGS: 00000246 ORIG_RAX: 0000000000000107 [ 3839.909473] RAX: ffffffffffffffda RBX: 000055630bfd5450 RCX: 00007f8d010f4837 [ 3839.916836] RDX: 0000000000000000 RSI: 000055630bfd5550 RDI: 0000000000000006 [ 3839.924134] RBP: 000055630bf774e0 R08: 0000000000000003 R09: 0000000000000000 [ 3839.931429] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 3839.938723] R13: 00007ffe2f4004f0 R14: 0000000000000000 R15: 000055630bfd5450 [ 3839.945945] Modules linked in: x86_pkg_temp_thermal [ 3839.950951] ---[ end trace 6d342d37fa137b43 ]--- [ 3839.955663] RIP: 0010:bch2_btree_iter_peek+0x9ed/0xa70 [ 3839.960890] Code: 03 41 80 69 02 01 75 18 4c 89 ce 89 45 a0 e8 0a 9f ff ff 41 0f b7 77 22 8b 45 a0 e9 df fc ff ff 41 0f b7 77 22 e9 d5 fc ff ff <0f> 0b 48 8b 38 89 f0 66 d1 e8 83 e0 01 41 28 41 03 41 80 69 02 01 [ 3839.979839] RSP: 0018:ffffa1a3817b77b0 EFLAGS: 00010246 [ 3839.985162] RAX: ffff8b8bc0cd4300 RBX: ffffa1a3817b78b0 RCX: 0000000000000000 [ 3839.992375] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [ 3839.999580] RBP: ffffa1a3817b7830 R08: 0000000000000001 R09: 000000002800001d [ 3840.006807] R10: ffff8b8987327400 R11: 0000000000000001 R12: 000000000000203a [ 3840.014048] R13: 728db65393926211 R14: ffff8b8bec2400b0 R15: ffffa1a3817b7860 [ 3840.021273] FS: 00007f8d00ff7740(0000) GS:ffff8b947fb40000(0000) knlGS:0000000000000000 [ 3840.029485] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3840.035292] CR2: 000055727d170098 CR3: 00000002b422c002 CR4: 00000000001706e0

It did not panic (ie need reset button) but the fs is kinda dead/comatose. Load average is 11.50

5.15.0-g1623e9cede9e+ git rev-parse HEAD 1623e9cede9e1cfbba82ae323a335c75dffd0a9b

bcachefs tool version v0.1-478-gbf7924f+

test: I created 6 snapshots of the linux kernel from 5.15.1 to 1.15.6. I then in 5 xterms in the .snap directory an rm -rf * and in the source dir (where the snapshot was taken from)

b-r-o-w-n commented 2 years ago

I picked up some additional changes

kernel-5.15.0-g01b9c0b2147b+ bcachefs tool version v0.1-478-gbf7924f+

I reproduced it again. It seems the dmesg log does a bit additional loggikng

[ 792.895020] ------------[ cut here ]------------ [ 792.895027] kernel BUG at fs/bcachefs/btree_iter.c:2392! [ 792.895037] invalid opcode: 0000 [#1] SMP PTI [ 792.899463] CPU: 7 PID: 6628 Comm: rm Not tainted 5.15.0-g01b9c0b2147b+ #18 [ 792.906516] Hardware name: System manufacturer System Product Name/P9X79 LE, BIOS 4801 07/24/2014 [ 792.915497] RIP: 0010:bch2_btree_iter_peek+0x9ed/0xa70 [ 792.920701] Code: 03 41 80 69 02 01 75 18 4c 89 ce 89 45 a0 e8 4a 9f ff ff 41 0f b7 77 22 8b 45 a0 e9 df fc ff ff 41 0f b7 77 22 e9 d5 fc ff ff <0f> 0b 48 8b 38 89 f0 66 d1 e8 83 e0 01 41 28 41 03 41 80 69 02 01 [ 792.939647] RSP: 0018:ffffb73dc3caba48 EFLAGS: 00010246 [ 792.944963] RAX: ffff9ebcfb034000 RBX: ffffb73dc3cabb78 RCX: 0000000000000000 [ 792.952196] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [ 792.959412] RBP: ffffb73dc3cabac8 R08: 0000000000000001 R09: 00000000fffffffa [ 792.966641] R10: ffff9ebc4933b400 R11: 0000000000000001 R12: 000000005000095b [ 792.973862] R13: 0000000000000002 R14: ffff9ebec5d400b0 R15: ffffb73dc3cabb28 [ 792.981081] FS: 00007f5b0ee99740(0000) GS:ffff9ec73fbc0000(0000) knlGS:0000000000000000 [ 792.989241] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 792.995073] CR2: 00007f809ad053f0 CR3: 0000000291b32002 CR4: 00000000001706e0 [ 793.002272] Call Trace: [ 793.004779] bch2_inode_delete_keys.isra.0+0x126/0x280 [ 793.009991] bch2_inode_rm+0x6e/0x2a0 [ 793.013728] evict+0xba/0x160 [ 793.016773] do_unlinkat+0x1b7/0x2b0 [ 793.020424] __x64_sys_unlinkat+0x2e/0x50 [ 793.024530] do_syscall_64+0x3b/0x90 [ 793.028205] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 793.033345] RIP: 0033:0x7f5b0ef96837 [ 793.036975] Code: 73 01 c3 48 8b 0d f9 f5 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 f5 0e 00 f7 d8 64 89 01 48 [ 793.055857] RSP: 002b:00007ffc115b9e78 EFLAGS: 00000246 ORIG_RAX: 0000000000000107 [ 793.063507] RAX: ffffffffffffffda RBX: 0000556d0b43fc30 RCX: 00007f5b0ef96837 [ 793.070729] RDX: 0000000000000000 RSI: 0000556d0b43fd30 RDI: 0000000000000004 [ 793.077939] RBP: 0000556d0b4204e0 R08: 0000000000000003 R09: 0000000000000000 [ 793.085139] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 793.092377] R13: 00007ffc115ba060 R14: 0000000000000000 R15: 0000556d0b43fc30 [ 793.099617] Modules linked in: x86_pkg_temp_thermal [ 793.104611] ---[ end trace b50a0e2c1df97dcd ]--- [ 793.109313] RIP: 0010:bch2_btree_iter_peek+0x9ed/0xa70 [ 793.114550] Code: 03 41 80 69 02 01 75 18 4c 89 ce 89 45 a0 e8 4a 9f ff ff 41 0f b7 77 22 8b 45 a0 e9 df fc ff ff 41 0f b7 77 22 e9 d5 fc ff ff <0f> 0b 48 8b 38 89 f0 66 d1 e8 83 e0 01 41 28 41 03 41 80 69 02 01 [ 793.133465] RSP: 0018:ffffb73dc3caba48 EFLAGS: 00010246 [ 793.138774] RAX: ffff9ebcfb034000 RBX: ffffb73dc3cabb78 RCX: 0000000000000000 [ 793.146003] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [ 793.153222] RBP: ffffb73dc3cabac8 R08: 0000000000000001 R09: 00000000fffffffa [ 793.160446] R10: ffff9ebc4933b400 R11: 0000000000000001 R12: 000000005000095b [ 793.167678] R13: 0000000000000002 R14: ffff9ebec5d400b0 R15: ffffb73dc3cabb28 [ 793.174923] FS: 00007f5b0ee99740(0000) GS:ffff9ec73fbc0000(0000) knlGS:0000000000000000 [ 793.183110] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 793.188934] CR2: 00007f809ad053f0 CR3: 0000000291b32002 CR4: 00000000001706e0