koverstreet / bcachefs

Other
680 stars 70 forks source link

"Failed to create a rescuer kthread" during mount leads to page fault #620

Closed srjek closed 8 months ago

srjek commented 10 months ago

I'm occasionally getting mount failures on boot. My current understanding is that there's two issues at play:

  1. bch2_fs_alloc gets an error from alloc_workqueue, then page faults during cleanup in either bch2_fs_open or bch2_fs_alloc.
  2. Something seems to be interrupting the mount, However I suspect the above issue could be terminating the process before it can reveal more information, so I haven't dug into this issue much.

kernel was built from 626ad6715cec64089af2dee57b55f9e848910cae

Dec 11 06:00:51 backup systemd[1]: Starting Mount /hdd...
Dec 11 06:00:51 backup kernel:  sdc: sdc1
Dec 11 06:00:51 backup kernel: workqueue: Failed to create a rescuer kthread for wq "bcachefs": -EINTR
Dec 11 06:00:51 backup kernel: BUG: unable to handle page fault for address: ffffffffffffffc8
Dec 11 06:00:51 backup kernel: #PF: supervisor read access in kernel mode
Dec 11 06:00:51 backup kernel: #PF: error_code(0x0000) - not-present page
Dec 11 06:00:51 backup kernel: PGD 36d025067 P4D 36d025067 PUD 36d027067 PMD 0
Dec 11 06:00:51 backup kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Dec 11 06:00:51 backup kernel: CPU: 3 PID: 366 Comm: bcachefs Not tainted 6.7.0-rc4-1-bcachefs-git-00109-g626ad6715c>
Dec 11 06:00:51 backup kernel: Hardware name: HARDKERNEL ODROID-H3/ODROID-H3, BIOS 5.19 02/27/2023
Dec 11 06:00:51 backup kernel: RIP: 0010:bch2_fs_btree_iter_exit+0xf7/0x110 [bcachefs]
Dec 11 06:00:51 backup kernel: Code: 24 20 35 00 00 e8 f9 9a 06 d0 49 8d bc 24 d8 34 00 00 5b 5d 41 5c e9 e8 9a 06 d>
Dec 11 06:00:51 backup kernel: RSP: 0018:ffffc90000eb3928 EFLAGS: 00010287
Dec 11 06:00:51 backup kernel: RAX: 0000000000000000 RBX: ffff888107d29a10 RCX: ffff888107d00030
Dec 11 06:00:51 backup kernel: RDX: ffff888107d034c8 RSI: ffffffff9030a9ad RDI: ffff888107d00000
Dec 11 06:00:51 backup kernel: RBP: ffff888107d00040 R08: ffff888100200088 R09: 0000000000000002
Dec 11 06:00:51 backup kernel: R10: ffff88846ffb4350 R11: 0000000000000001 R12: ffff888107d00000
Dec 11 06:00:51 backup kernel: R13: ffff888107d29a10 R14: ffffffffc03dd7e0 R15: ffff888107d00000
Dec 11 06:00:51 backup kernel: FS:  00007f43f9755c80(0000) GS:ffff88846ff80000(0000) knlGS:0000000000000000
Dec 11 06:00:51 backup kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 11 06:00:51 backup kernel: CR2: ffffffffffffffc8 CR3: 00000001042e4000 CR4: 0000000000350ef0
Dec 11 06:00:51 backup kernel: Call Trace:
Dec 11 06:00:51 backup kernel:  <TASK>
Dec 11 06:00:51 backup kernel:  ? __die+0x23/0x70
Dec 11 06:00:51 backup kernel:  ? page_fault_oops+0x171/0x4e0
Dec 11 06:00:51 backup kernel:  ? exc_page_fault+0x175/0x180
Dec 11 06:00:51 backup kernel:  ? asm_exc_page_fault+0x26/0x30
Dec 11 06:00:51 backup kernel:  ? mempool_exit+0x7d/0x90
Dec 11 06:00:51 backup kernel:  ? bch2_fs_btree_iter_exit+0xf7/0x110 [bcachefs 885e832b844ff88c63e56c7b8d8e99d9e736a>
Dec 11 06:00:51 backup kernel:  bch2_fs_release+0xb8/0x280 [bcachefs 885e832b844ff88c63e56c7b8d8e99d9e736a4e1]
Dec 11 06:00:51 backup kernel:  kobject_put+0x78/0x190
Dec 11 06:00:51 backup kernel:  bch2_fs_open+0x10aa/0x15b0 [bcachefs 885e832b844ff88c63e56c7b8d8e99d9e736a4e1]
Dec 11 06:00:51 backup kernel:  ? idr_alloc+0x3a/0x70
Dec 11 06:00:51 backup kernel:  ? __kmem_cache_alloc_node+0x1a0/0x2e0
Dec 11 06:00:51 backup kernel:  ? bch2_mount+0x4e3/0x720 [bcachefs 885e832b844ff88c63e56c7b8d8e99d9e736a4e1]
Dec 11 06:00:51 backup kernel:  bch2_mount+0x4e3/0x720 [bcachefs 885e832b844ff88c63e56c7b8d8e99d9e736a4e1]
Dec 11 06:00:51 backup kernel:  legacy_get_tree+0x28/0x50
Dec 11 06:00:51 backup kernel:  vfs_get_tree+0x26/0xf0
Dec 11 06:00:51 backup kernel:  path_mount+0x4a3/0xae0
Dec 11 06:00:51 backup kernel:  __x64_sys_mount+0x11a/0x150
Dec 11 06:00:51 backup kernel:  do_syscall_64+0x61/0xe0
Dec 11 06:00:51 backup kernel:  ? __count_memcg_events+0x42/0x90
Dec 11 06:00:51 backup kernel:  ? count_memcg_events.constprop.0+0x1a/0x30
Dec 11 06:00:51 backup kernel:  ? handle_mm_fault+0xa2/0x360
Dec 11 06:00:51 backup kernel:  ? do_user_addr_fault+0x30f/0x660
Dec 11 06:00:51 backup kernel:  ? exc_page_fault+0x7f/0x180
Dec 11 06:00:51 backup kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
Dec 11 06:00:51 backup kernel: RIP: 0033:0x7f43f9877a1e
Dec 11 06:00:51 backup kernel: Code: 48 8b 0d 15 63 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 0>
Dec 11 06:00:51 backup kernel: RSP: 002b:00007ffc07bcd218 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
Dec 11 06:00:51 backup kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43f9877a1e
Dec 11 06:00:51 backup kernel: RDX: 000055dc2268eb60 RSI: 000055dc226ab780 RDI: 000055dc226acd00
Dec 11 06:00:51 backup kernel: RBP: 000055dc2268eb60 R08: 0000000000000000 R09: 0003010003000300
Dec 11 06:00:51 backup kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000055dc22692100
Dec 11 06:00:51 backup kernel: R13: 000055dc226acd00 R14: 000055dc22690910 R15: 0000000000000003
Dec 11 06:00:51 backup kernel:  </TASK>
Dec 11 06:00:51 backup kernel: Modules linked in: libdes nft_fib_ipv4 algif_skcipher nft_fib cmac nft_chain_nat nf_n>
Dec 11 06:00:51 backup kernel:  sha256_ssse3 btmtk snd_pcm processor_thermal_rapl intel_rapl_msr sha1_ssse3 fat mei_>
Dec 11 06:00:51 backup kernel: CR2: ffffffffffffffc8
Dec 11 06:00:51 backup kernel: ---[ end trace 0000000000000000 ]---
Dec 11 06:00:51 backup kernel: RIP: 0010:bch2_fs_btree_iter_exit+0xf7/0x110 [bcachefs]
Dec 11 06:00:51 backup kernel: Code: 24 20 35 00 00 e8 f9 9a 06 d0 49 8d bc 24 d8 34 00 00 5b 5d 41 5c e9 e8 9a 06 d>
Dec 11 06:00:51 backup kernel: RSP: 0018:ffffc90000eb3928 EFLAGS: 00010287
Dec 11 06:00:51 backup kernel: RAX: 0000000000000000 RBX: ffff888107d29a10 RCX: ffff888107d00030
Dec 11 06:00:51 backup kernel: RDX: ffff888107d034c8 RSI: ffffffff9030a9ad RDI: ffff888107d00000
Dec 11 06:00:51 backup kernel: RBP: ffff888107d00040 R08: ffff888100200088 R09: 0000000000000002
Dec 11 06:00:51 backup kernel: R10: ffff88846ffb4350 R11: 0000000000000001 R12: ffff888107d00000
Dec 11 06:00:51 backup kernel: R13: ffff888107d29a10 R14: ffffffffc03dd7e0 R15: ffff888107d00000
Dec 11 06:00:51 backup kernel: FS:  00007f43f9755c80(0000) GS:ffff88846ff80000(0000) knlGS:0000000000000000
Dec 11 06:00:51 backup kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 11 06:00:51 backup kernel: CR2: ffffffffffffffc8 CR3: 00000001042e4000 CR4: 0000000000350ef0
Dec 11 06:00:51 backup kernel: note: bcachefs[366] exited with irqs disabled
Dec 11 06:00:51 backup systemd[1]: mount-hdd.service: Main process exited, code=killed, status=15/TERM
Dec 11 06:00:51 backup systemd[1]: mount-hdd.service: Failed with result 'signal'.
Dec 11 06:00:51 backup systemd[1]: Stopped Mount /hdd.
srjek commented 8 months ago

Page fault was fixed by 50a8a73.

The other half of the issue has disappeared from my machine, I believe after I changed a systemd unit from using bcachefs mount UUID=... /mnt to strace -e 'trace=!all' bcachefs mount /dev/sda1:/dev/sdb1:/dev/sdc1 /mnt.

Since the remainder of this is most likely not a bcachefs issue, I'm gonna close this out.