koverstreet / bcachefs

Other
633 stars 69 forks source link

Bcachefs hangs on intense write operations #673

Open maartenvanmalland opened 2 months ago

maartenvanmalland commented 2 months ago

On my NAS I have the following setup:

- Created a RAM drive: modprobe -s -q brd rd_nr=1 rd_size=54857544 max_part=1
- Created a local loop device as overflow target: losetup /dev/loop667 /opt/overflow
- Formatted it with bcachefs: bcachefs format  --label=ssd.ssd1 /dev/ram0 --label=hdd.hdd1 /dev/loop667 --foreground_target=ssd --promote_target=ssd --background_target=hdd
- Mounted it on my temp download folder: mount -t bcachefs /dev/ram0:/dev/loop667 /tmpfs/

I guess this is one of those setups that you might be thinking, oh dear, what kind of users am I attracting here :). Anyways, this setup performs like a madman, up until the point it crashes/hangs:

[Wed Apr 17 10:58:31 2024] ------------[ cut here ]------------
[Wed Apr 17 10:58:31 2024] btree trans held srcu lock (delaying memory reclaim) for 18 seconds
[Wed Apr 17 10:58:31 2024] WARNING: CPU: 21 PID: 3562983 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x118/0x130
[bcachefs]
[Wed Apr 17 10:58:31 2024] WARNING: CPU: 9 PID: 3042134 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x118/0x130 [
bcachefs]
[Wed Apr 17 10:58:31 2024] WARNING: CPU: 14 PID: 1602374 at fs/bcachefs/btree_iter.c:2871 bch2_trans_put+0x23b/0x260 [bcachef
s]
[Wed Apr 17 10:58:31 2024] Modules linked in: brd bcachefs lz4hc_compress veth xt_MASQUERADE nf_conntrack_netlink
[Wed Apr 17 10:58:31 2024] WARNING: CPU: 17 PID: 3618567 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x118/0x130
[bcachefs]
[Wed Apr 17 10:58:31 2024]  xt_addrtype br_netfilter tcp_diag udp_diag inet_diag vhost_net vhost vhost_iotlb tap tun snd_seq_
dummy snd_hrtimer cpufreq_conservative cpufreq_userspace cpufreq_powersave cpufreq_ondemand xt_conntrack xfrm_user xfrm_algo
rdma_ucm ib_uverbs rdma_cm vmw_vsock_vmci_transport vsock iw_cm vmw_vmci scsi_transport_iscsi
[Wed Apr 17 10:58:31 2024] Modules linked in:

<cut a lot of linked modules, can paste it later if needed> 

[Wed Apr 17 10:58:31 2024] CPU: 21 PID: 3562983 Comm: kworker/u113:2 Tainted: P        W  OE      6.9.0-rc4 #1
[Wed Apr 17 10:58:31 2024]  crc_t10dif
[Wed Apr 17 10:58:31 2024]  ahci
[Wed Apr 17 10:58:31 2024]  blake2b_generic
[Wed Apr 17 10:58:31 2024]  sfc_driverlink(OE)
[Wed Apr 17 10:58:31 2024] Hardware name: System manufacturer System Product Name/WS X299 PRO_SE, BIOS 4101 08/25/2023
[Wed Apr 17 10:58:31 2024]  drm
[Wed Apr 17 10:58:31 2024]  libahci
[Wed Apr 17 10:58:31 2024]  efivarfs
[Wed Apr 17 10:58:31 2024]  igb
[Wed Apr 17 10:58:31 2024]  xhci_pci
[Wed Apr 17 10:58:31 2024]  raid10
[Wed Apr 17 10:58:31 2024]  crc32c_intel
[Wed Apr 17 10:58:31 2024]  t10_pi
[Wed Apr 17 10:58:31 2024]  scsi_mod
[Wed Apr 17 10:58:31 2024]  z3fold
[Wed Apr 17 10:58:31 2024]  crct10dif_generic
[Wed Apr 17 10:58:31 2024]  lz4
[Wed Apr 17 10:58:31 2024]  usbcore
[Wed Apr 17 10:58:31 2024]  sfc(OE)
[Wed Apr 17 10:58:31 2024]  lz4_compress
[Wed Apr 17 10:58:31 2024]  mxm_wmi
[Wed Apr 17 10:58:31 2024]  sr_mod
[Wed Apr 17 10:58:31 2024]  cdrom
[Wed Apr 17 10:58:31 2024]  libata
[Wed Apr 17 10:58:31 2024]  xhci_hcd
[Wed Apr 17 10:58:31 2024]  i915
[Wed Apr 17 10:58:31 2024]  mtd
[Wed Apr 17 10:58:31 2024]  crc64_rocksoft
[Wed Apr 17 10:58:31 2024]  i2c_i801
[Wed Apr 17 10:58:31 2024]  drm_buddy
[Wed Apr 17 10:58:31 2024]  crc32_pclmul
[Wed Apr 17 10:58:31 2024]  crct10dif_pclmul
[Wed Apr 17 10:58:31 2024]  video
[Wed Apr 17 10:58:31 2024]  dca crc64
[Wed Apr 17 10:58:31 2024]  crc_t10dif
[Wed Apr 17 10:58:31 2024]  drm_display_helper
[Wed Apr 17 10:58:31 2024]  sfc_driverlink(OE)
[Wed Apr 17 10:58:31 2024]  cec
[Wed Apr 17 10:58:31 2024]  i2c_smbus i2c_algo_bit
[Wed Apr 17 10:58:31 2024]  drm
[Wed Apr 17 10:58:31 2024]  crct10dif_common
[Wed Apr 17 10:58:31 2024]  rc_core
[Wed Apr 17 10:58:31 2024]  igb
[Wed Apr 17 10:58:31 2024]  ttm
[Wed Apr 17 10:58:31 2024]  scsi_common
[Wed Apr 17 10:58:31 2024]  crc32c_intel
[Wed Apr 17 10:58:31 2024]  dm_snapshot dm_bufio
[Wed Apr 17 10:58:31 2024]  scsi_mod
[Wed Apr 17 10:58:31 2024]  usb_common
[Wed Apr 17 10:58:31 2024]  raid456
[Wed Apr 17 10:58:31 2024]  crct10dif_generic
[Wed Apr 17 10:58:31 2024]  wmi
[Wed Apr 17 10:58:31 2024]  button
[Wed Apr 17 10:58:31 2024]  usbcore
[Wed Apr 17 10:58:31 2024]  async_memcpy async_pq
[Wed Apr 17 10:58:31 2024]  crct10dif_pclmul
[Wed Apr 17 10:58:31 2024]  async_xor async_tx
[Wed Apr 17 10:58:31 2024]  dca
[Wed Apr 17 10:58:31 2024] CPU: 14 PID: 1602374 Comm: sabnzbdplus Tainted: P        W  OE      6.9.0-rc4 #1
[Wed Apr 17 10:58:31 2024]  xor hid_generic
[Wed Apr 17 10:58:31 2024]  crc64
[Wed Apr 17 10:58:31 2024] Hardware name: System manufacturer System Product Name/WS X299 PRO_SE, BIOS 4101 08/25/2023
[Wed Apr 17 10:58:31 2024]  i2c_smbus
[Wed Apr 17 10:58:31 2024]  usbhid
[Wed Apr 17 10:58:31 2024]  i2c_algo_bit
[Wed Apr 17 10:58:31 2024] Workqueue: bcachefs bch2_write_point_do_index_updates [bcachefs]
[Wed Apr 17 10:58:31 2024]  uas hid
[Wed Apr 17 10:58:31 2024]  crct10dif_common scsi_common
[Wed Apr 17 10:58:31 2024]  usb_storage raid6_pq
[Wed Apr 17 10:58:31 2024]  usb_common
[Wed Apr 17 10:58:31 2024]  libcrc32c
[Wed Apr 17 10:58:31 2024]  wmi
[Wed Apr 17 10:58:31 2024]  crc32c_generic
[Wed Apr 17 10:58:31 2024]  button
[Wed Apr 17 10:58:31 2024]  raid0 bcache
[Wed Apr 17 10:58:31 2024]  [last unloaded: brd]
[Wed Apr 17 10:58:31 2024]  dm_mod
[Wed Apr 17 10:58:31 2024]  sd_mod raid1 md_mod ast drm_shmem_helper
[Wed Apr 17 10:58:31 2024] CPU: 9 PID: 3042134 Comm: kworker/9:2H Tainted: P        W  OE      6.9.0-rc4 #1
[Wed Apr 17 10:58:31 2024]  nvme nvme_core drm_kms_helper ahci
[Wed Apr 17 10:58:31 2024] Hardware name: System manufacturer System Product Name/WS X299 PRO_SE, BIOS 4101 08/25/2023
[Wed Apr 17 10:58:31 2024]  libahci xhci_pci t10_pi sfc(OE) mxm_wmi libata xhci_hcd crc64_rocksoft crc32_pclmul crc_t10dif sf
c_driverlink(OE) drm igb crc32c_intel scsi_mod crct10dif_generic usbcore mtd i2c_i801
[Wed Apr 17 10:58:31 2024] RIP: 0010:bch2_trans_put+0x23b/0x260 [bcachefs]
[Wed Apr 17 10:58:31 2024]  crct10dif_pclmul dca crc64 i2c_smbus i2c_algo_bit crct10dif_common scsi_common usb_common wmi
[Wed Apr 17 10:58:31 2024]  button [last unloaded: brd]
[Wed Apr 17 10:58:31 2024] CPU: 17 PID: 3618567 Comm: kworker/17:2H Tainted: P        W  OE      6.9.0-rc4 #1
[Wed Apr 17 10:58:31 2024] Hardware name: System manufacturer System Product Name/WS X299 PRO_SE, BIOS 4101 08/25/2023
[Wed Apr 17 10:58:31 2024] Workqueue: bcachefs_btree_io btree_node_write_work [bcachefs]
[Wed Apr 17 10:58:31 2024] RIP: 0010:bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024] Code: 0b 95 76 d3 48 c7 c7 70 3d 7f c6 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 d1 ea 48 f7 e2 48
 89 d6 48 c1 ee 04 e8 65 1a c2 d1 <0f> 0b 8b b5 a8 00 00 00 49 8d be 68 36 00 00 83 fe 01 77 0a e8 ec
[Wed Apr 17 10:58:31 2024] RSP: 0018:ffff9dfbe32d79e8 EFLAGS: 00010286
[Wed Apr 17 10:58:31 2024] Workqueue: bcachefs_btree_io btree_node_write_work [bcachefs]
[Wed Apr 17 10:58:31 2024] RAX: 0000000000000000 RBX: ffff9dfbe32d7c20 RCX: 0000000000000000
[Wed Apr 17 10:58:31 2024] RDX: 0000000000000002 RSI: ffffffff9967d132 RDI: 00000000ffffffff
[Wed Apr 17 10:58:31 2024] RBP: ffff909cb22fc000 R08: 0000000000000000 R09: 0000000000000003
[Wed Apr 17 10:58:31 2024] R10: ffff9dfbe32d7870 R11: ffff90ae5ff6cc28 R12: ffff909cb22fc000
[Wed Apr 17 10:58:31 2024] R13: ffff90917ba3ca60 R14: ffff908ed4ec0000 R15: 0000000000000000
[Wed Apr 17 10:58:31 2024] FS:  00007fb7fd6006c0(0000) GS:ffff90addfb00000(0000) knlGS:0000000000000000
[Wed Apr 17 10:58:31 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Apr 17 10:58:31 2024] CR2: 00000000f2c71000 CR3: 000000011ccf2002 CR4: 00000000003726f0
[Wed Apr 17 10:58:31 2024] Code: ee a1 76 d3 48 c7 c7 70 3d 7f c6 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 d1 ea 48 f7 e2 48
 89 d6 48 c1 ee 04 e8 48 27 c2 d1 <0f> 0b e9 64 ff ff ff 0f 0b e9 73 ff ff ff 66 2e 0f 1f 84 00 00 00
[Wed Apr 17 10:58:31 2024] Call Trace:
[Wed Apr 17 10:58:31 2024] RSP: 0018:ffff9dfbce32bc10 EFLAGS: 00010286
[Wed Apr 17 10:58:31 2024]  <TASK>
[Wed Apr 17 10:58:31 2024] RAX: 0000000000000000 RBX: ffff909b00734000 RCX: 0000000000000000
[Wed Apr 17 10:58:31 2024] RIP: 0010:bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024] RDX: 0000000000000002 RSI: ffffffff9967d132 RDI: 00000000ffffffff
[Wed Apr 17 10:58:31 2024] RBP: ffff908ed4ec0000 R08: 0000000000000000 R09: 0000000000000003
[Wed Apr 17 10:58:31 2024] R13: ffff909b00734000 R14: 0000000000000006 R15: ffff909b00734478
[Wed Apr 17 10:58:31 2024] FS:  0000000000000000(0000) GS:ffff90addfe80000(0000) knlGS:0000000000000000
[Wed Apr 17 10:58:31 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Apr 17 10:58:31 2024] CR2: 0000000000f90f80 CR3: 000000156b620002 CR4: 00000000003726f0
[Wed Apr 17 10:58:31 2024] Call Trace:
[Wed Apr 17 10:58:31 2024]  <TASK>
[Wed Apr 17 10:58:31 2024]  ? __warn+0x80/0x120
[Wed Apr 17 10:58:31 2024] Code: ee a1 76 d3 48 c7 c7 70 3d 7f c6 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 d1 ea 48 f7 e2 48
 89 d6 48 c1 ee 04 e8 48 27 c2 d1 <0f> 0b e9 64 ff ff ff 0f 0b e9 73 ff ff ff 66 2e 0f 1f 84 00 00 00
[Wed Apr 17 10:58:31 2024] RIP: 0010:bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024] RSP: 0018:ffff9dfc10cafd70 EFLAGS: 00010286
[Wed Apr 17 10:58:31 2024] RAX: 0000000000000000 RBX: ffff90a5640bc000 RCX: 0000000000000000
[Wed Apr 17 10:58:31 2024] RDX: 0000000000000002 RSI: ffffffff9967d132 RDI: 00000000ffffffff
[Wed Apr 17 10:58:31 2024] RBP: ffff908ed4ec0000 R08: 0000000000000000 R09: 0000000000000003
[Wed Apr 17 10:58:31 2024] R10: ffff9dfc10cafbf8 R11: ffff90ae5ff6cc28 R12: ffff90a5640bc278
[Wed Apr 17 10:58:31 2024] R13: ffff90a5640bc000 R14: 0000000000000002 R15: ffff90a5640bc278
[Wed Apr 17 10:58:31 2024] FS:  0000000000000000(0000) GS:ffff90addf880000(0000) knlGS:0000000000000000
[Wed Apr 17 10:58:31 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Apr 17 10:58:31 2024] CR2: 00007f5f38a49480 CR3: 000000156b620006 CR4: 00000000003726f0
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024] Call Trace:
[Wed Apr 17 10:58:31 2024]  <TASK>
[Wed Apr 17 10:58:31 2024]  ? __warn+0x80/0x120
[Wed Apr 17 10:58:31 2024] Code: ee a1 76 d3 48 c7 c7 70 3d 7f c6 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 d1 ea 48 f7 e2 48
 89 d6 48 c1 ee 04 e8 48 27 c2 d1 <0f> 0b e9 64 ff ff ff 0f 0b e9 73 ff ff ff 66 2e 0f 1f 84 00 00 00
[Wed Apr 17 10:58:31 2024] RSP: 0018:ffff9dfbe0aa7d70 EFLAGS: 00010286
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_put+0x23b/0x260 [bcachefs]
[Wed Apr 17 10:58:31 2024] RAX: 0000000000000000 RBX: ffff909b84df4000 RCX: 0000000000000000
[Wed Apr 17 10:58:31 2024] RDX: 0000000000000002 RSI: ffffffff9967d132 RDI: 00000000ffffffff
[Wed Apr 17 10:58:31 2024] RBP: ffff908ed4ec0000 R08: 0000000000000000 R09: 0000000000000003
[Wed Apr 17 10:58:31 2024] R10: ffff9dfbe0aa7bf8 R11: ffff90ae5ff6cc28 R12: ffff909b84df4278
[Wed Apr 17 10:58:31 2024] R13: ffff909b84df4000 R14: 0000000000000002 R15: ffff909b84df4278
[Wed Apr 17 10:58:31 2024] FS:  0000000000000000(0000) GS:ffff90addfc80000(0000) knlGS:0000000000000000
[Wed Apr 17 10:58:31 2024]  ? report_bug+0x191/0x1c0
[Wed Apr 17 10:58:31 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Wed Apr 17 10:58:31 2024] CR2: 00007feea662f480 CR3: 00000002ca3b4005 CR4: 00000000003726f0
[Wed Apr 17 10:58:31 2024]  ? console_unlock+0x78/0x120
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024] Call Trace:
[Wed Apr 17 10:58:31 2024]  <TASK>
[Wed Apr 17 10:58:31 2024]  ? handle_bug+0x41/0x70
[Wed Apr 17 10:58:31 2024]  ? __warn+0x80/0x120
[Wed Apr 17 10:58:31 2024]  ? exc_invalid_op+0x17/0x70
[Wed Apr 17 10:58:31 2024]  ? asm_exc_invalid_op+0x1a/0x20
[Wed Apr 17 10:58:31 2024]  ? report_bug+0x191/0x1c0
[Wed Apr 17 10:58:31 2024]  ? console_unlock+0x78/0x120
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? handle_bug+0x41/0x70
[Wed Apr 17 10:58:31 2024]  ? exc_invalid_op+0x17/0x70
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? report_bug+0x191/0x1c0
[Wed Apr 17 10:58:31 2024]  ? asm_exc_invalid_op+0x1a/0x20
[Wed Apr 17 10:58:31 2024]  ? prb_read_valid+0x1b/0x30
[Wed Apr 17 10:58:31 2024]  ? handle_bug+0x41/0x70
[Wed Apr 17 10:58:31 2024]  ? exc_invalid_op+0x17/0x70
[Wed Apr 17 10:58:31 2024]  ? report_bug+0x191/0x1c0
[Wed Apr 17 10:58:31 2024]  ? asm_exc_invalid_op+0x1a/0x20
[Wed Apr 17 10:58:31 2024]  ? handle_bug+0x41/0x70
[Wed Apr 17 10:58:31 2024]  ? exc_invalid_op+0x17/0x70
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? asm_exc_invalid_op+0x1a/0x20
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_begin+0xf8/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_put+0x23b/0x260 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_srcu_unlock+0x118/0x130 [bcachefs]
[Wed Apr 17 10:58:31 2024]  bch2_trans_begin+0x5a6/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_begin+0xf8/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_put+0x23b/0x260 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? bch2_trans_begin+0xf8/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  bch2_trans_begin+0x5a6/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  bch2_write_index_default+0xce/0x4c0 [bcachefs]
[Wed Apr 17 10:58:31 2024]  bch2_trans_begin+0x5a6/0x600 [bcachefs]
[Wed Apr 17 10:58:31 2024]  bch2_setattr_nonsize+0x1a4/0x4a0 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? __mod_node_page_state+0x7a/0xb0
[Wed Apr 17 10:58:31 2024]  btree_node_write_work+0xe3/0x460 [bcachefs]
[Wed Apr 17 10:58:31 2024]  notify_change+0x22c/0x4d0
[Wed Apr 17 10:58:31 2024]  ? chmod_common+0xbb/0x170
[Wed Apr 17 10:58:31 2024]  chmod_common+0xbb/0x170
[Wed Apr 17 10:58:31 2024]  btree_node_write_work+0xe3/0x460 [bcachefs]
[Wed Apr 17 10:58:31 2024]  __x64_sys_chmod+0x6a/0xc0
[Wed Apr 17 10:58:31 2024]  do_syscall_64+0x86/0x1a0
[Wed Apr 17 10:58:31 2024]  ? __bch2_write_index+0x195/0x290 [bcachefs]
[Wed Apr 17 10:58:31 2024]  process_one_work+0x189/0x3b0
[Wed Apr 17 10:58:31 2024]  worker_thread+0x277/0x390
[Wed Apr 17 10:58:31 2024]  ? _raw_spin_lock_irqsave+0x27/0x60
[Wed Apr 17 10:58:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  kthread+0xde/0x110
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  ret_from_fork+0x31/0x50
[Wed Apr 17 10:58:31 2024]  process_one_work+0x189/0x3b0
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  worker_thread+0x277/0x390
[Wed Apr 17 10:58:31 2024]  ret_from_fork_asm+0x1a/0x30
[Wed Apr 17 10:58:31 2024]  ? efx_poll+0x266/0x650 [sfc]
[Wed Apr 17 10:58:31 2024]  ? _raw_spin_lock_irqsave+0x27/0x60
[Wed Apr 17 10:58:31 2024]  </TASK>
[Wed Apr 17 10:58:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Wed Apr 17 10:58:31 2024] ---[ end trace 0000000000000000 ]---
[Wed Apr 17 10:58:31 2024]  kthread+0xde/0x110
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  ret_from_fork+0x31/0x50
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  ret_from_fork_asm+0x1a/0x30
[Wed Apr 17 10:58:31 2024]  </TASK>
[Wed Apr 17 10:58:31 2024] ---[ end trace 0000000000000000 ]---
[Wed Apr 17 10:58:31 2024]  ? __napi_poll+0x28/0x1c0
[Wed Apr 17 10:58:31 2024]  __bch2_write_index+0x195/0x290 [bcachefs]
[Wed Apr 17 10:58:31 2024]  ? net_rx_action+0x2ab/0x390
[Wed Apr 17 10:58:31 2024]  ? __raise_softirq_irqoff+0x1a/0x80
[Wed Apr 17 10:58:31 2024]  ? __do_softirq+0x15d/0x2f8
[Wed Apr 17 10:58:31 2024]  ? fpregs_assert_state_consistent+0x25/0x50
[Wed Apr 17 10:58:31 2024]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Wed Apr 17 10:58:31 2024] RIP: 0033:0x7fb814b9fc47
[Wed Apr 17 10:58:31 2024] Code: eb d9 e8 9c 04 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 5f 00 00 00 0f 05 c3 0f 1f 84 00
 00 00 00 00 b8 5a 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 a1 0d 00 f7 d8 64 89 01 48
[Wed Apr 17 10:58:31 2024] RSP: 002b:00007fb7fd5ffba8 EFLAGS: 00000246 ORIG_RAX: 000000000000005a
[Wed Apr 17 10:58:31 2024] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb814b9fc47
[Wed Apr 17 10:58:31 2024] RDX: 0000000000000000 RSI: 00000000000001b6 RDI: 00007fb8116c9700
[Wed Apr 17 10:58:31 2024] RBP: 00000000000001b6 R08: 0000000000a5b8a8 R09: 0000000000a5b8c8
[Wed Apr 17 10:58:31 2024] R10: 0000000000000000 R11: 0000000000000246 R12: 000000003b059c40
[Wed Apr 17 10:58:31 2024] R13: 00000000ffffff9c R14: 00007fb7fd5ffbe0 R15: 000000003b37426e
[Wed Apr 17 10:58:31 2024]  bch2_write_point_do_index_updates+0xb1/0x160 [bcachefs]
[Wed Apr 17 10:58:31 2024]  </TASK>
[Wed Apr 17 10:58:31 2024] ---[ end trace 0000000000000000 ]---
[Wed Apr 17 10:58:31 2024]  process_one_work+0x189/0x3b0
[Wed Apr 17 10:58:31 2024]  worker_thread+0x277/0x390
[Wed Apr 17 10:58:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  kthread+0xde/0x110
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  ret_from_fork+0x31/0x50
[Wed Apr 17 10:58:31 2024]  ? __pfx_kthread+0x10/0x10
[Wed Apr 17 10:58:31 2024]  ret_from_fork_asm+0x1a/0x30
[Wed Apr 17 10:58:31 2024]  </TASK>
[Wed Apr 17 10:58:31 2024] ---[ end trace 0000000000000000 ]---

This only happens when there is intense writing (and I'm guessing when the data begins to overflow to the hdd target).

maartenvanmalland commented 2 months ago

Inspired by https://github.com/koverstreet/bcachefs/issues/672, here some extra info as the effect of this crash is indeed a hanging filesystem as well:

# cat /sys/fs/bcachefs/*/internal/journal_debug
dirty journal entries:  763/32768
seq:                    6003
seq_ondisk:             5999
last_seq:               5241
last_seq_ondisk:        5241
flushed_seq_ondisk:     5890
watermark:              stripe
each entry reserved:    401
nr flush writes:        284
nr noflush writes:      5699
average write size:     246 KiB
nr direct reclaim:      0
nr background reclaim:  544622
reclaim kicked:         0
reclaim runs in:        12 ms
blocked:                0
current entry sectors:  512
current entry error:    ok
current entry:          closed
unwritten entries:
seq:                    6000
  refcount:             0
  size:                 136 KiB
  expires:              -135360 jiffies
  flags:                must_flush separate_flush write_started write allocated
seq:                    6001
  refcount:             0
  size:                 255 KiB
  expires:              -135109 jiffies
  flags:                noflush separate_flush write_started write allocated write done
seq:                    6002
  refcount:             0
  size:                 255 KiB
  expires:              -135109 jiffies
  flags:                noflush separate_flush write_started write allocated write done
seq:                    6003
  refcount:             0
  size:                 255 KiB
  expires:              -135109 jiffies
  flags:                noflush separate_flush write_started write allocated write done
last buf closed
replay done:            1
space:
        discarded       512:4193280
        clean ondisk    512:4193280
        clean           512:4193280
        total           512:4194304
dev 0:
        nr              1582
        bucket size     512
        available       818:0
        discard_idx     307
        dirty_ondisk    307 (seq 5241)
        dirty_idx       307 (seq 5241)
        cur_idx         1069 (seq 6003)
dev 1:
        nr              8192
        bucket size     512
        available       8190:0
        discard_idx     0
        dirty_ondisk    0 (seq 0)
        dirty_idx       0 (seq 0)
        cur_idx         0 (seq 0)

# cat /sys/fs/bcachefs/caa656c1-a03a-4fda-a11d-dd1f0ac8d896/dev-0/alloc_debug
                 buckets         sectors      fragmented
free              129877               0               0
sb                    13            6152             504
journal             1582          809984               0
btree                106           54272               0
user                5898         3013288            6488
cached               801          410912               0
parity                 0               0               0
stripe                 0               0               0
need_gc_gens           0               0               0
need_discard       64291               0               0

reserves:
stripe              6386
normal              3221
copygc                56
btree                 28
btree_copygc           0
reclaim                0
interior_updates       0

freelist_wait           empty
open buckets allocated  107
open buckets this dev   101
open buckets total      1024
open_buckets_wait       empty
open_buckets_btree      28
open_buckets_user       78
buckets_to_invalidate   0
btree reserve cache     28`

# cat /sys/kernel/debug/bcachefs/caa656c1-a03a-4fda-a11d-dd1f0ac8d896/btree_transactions
2075159 bch2_moving_ctxt_init
backtrace:
  [<0>] kthread+0xde/0x110
  [<0>] ret_from_fork+0x31/0x50
  [<0>] ret_from_fork_asm+0x1a/0x30

2075160 bch2_moving_ctxt_init
backtrace:
  [<0>] bch2_rebalance_thread+0x66/0xb0 [bcachefs]
  [<0>] kthread+0xde/0x110
  [<0>] ret_from_fork+0x31/0x50
  [<0>] ret_from_fork_asm+0x1a/0x30

2177571 bch2_do_discards_work
backtrace:
  [<0>] bch2_discard_one_bucket+0x88b/0x8c0 [bcachefs]
  [<0>] bch2_do_discards_work+0x1f5/0x3a0 [bcachefs]
  [<0>] process_one_work+0x189/0x3b0
  [<0>] worker_thread+0x277/0x390
  [<0>] kthread+0xde/0x110
  [<0>] ret_from_fork+0x31/0x50
  [<0>] ret_from_fork_asm+0x1a/0x30

2182976 btree_node_write_work
backtrace:
  [<0>] __bch2_trans_commit+0xd18/0x16f0 [bcachefs]
  [<0>] __bch2_btree_node_update_key+0x28c/0x570 [bcachefs]
  [<0>] bch2_btree_node_update_key+0x35f/0x3a0 [bcachefs]
  [<0>] bch2_btree_node_update_key_get_iter+0x1d5/0x330 [bcachefs]
  [<0>] btree_node_write_work+0x10a/0x460 [bcachefs]
  [<0>] process_one_work+0x189/0x3b0
  [<0>] worker_thread+0x277/0x390
  [<0>] kthread+0xde/0x110
  [<0>] ret_from_fork+0x31/0x50
  [<0>] ret_from_fork_asm+0x1a/0x30

3914811 btree_node_write_work
backtrace:
  [<0>] __bch2_trans_commit+0xd18/0x16f0 [bcachefs]
  [<0>] __bch2_btree_node_update_key+0x28c/0x570 [bcachefs]
  [<0>] bch2_btree_node_update_key+0x35f/0x3a0 [bcachefs]
  [<0>] bch2_btree_node_update_key_get_iter+0x1d5/0x330 [bcachefs]
  [<0>] btree_node_write_work+0x10a/0x460 [bcachefs]
  [<0>] process_one_work+0x189/0x3b0
  [<0>] worker_thread+0x277/0x390
  [<0>] kthread+0xde/0x110
  [<0>] ret_from_fork+0x31/0x50
  [<0>] ret_from_fork_asm+0x1a/0x30

Full dmesg output: http://vpaste.net/FineR

koverstreet commented 2 months ago

Ok, so the journal got stuck. I just started seeing a similar bug in one of the tests - let me see if I can get that one figured out and maybe add some more debug info.

maartenvanmalland commented 2 months ago

That would be great. I can test things for you as well if needed; I can let it crash on command :)

NHOrus commented 2 months ago

I think I had same problem? Nothing stuck, external hdd with bcachefs, deleting many small files


------------[ cut here ]------------
btree trans held srcu lock (delaying memory reclaim) for 18 seconds
WARNING: CPU: 1 PID: 7310 at fs/bcachefs/btree_iter.c:2827 bch2_trans_srcu_unlock+0x126/0x130 [bcachefs]
Modules linked in: wireguard ip6_udp_tunnel udp_tunnel libchacha20poly1305 poly1305_x86_64 chacha_x86_64 curve25519_x86_64 libcurve25519_generic libchacha bcachefs raid6_pq xor uas usb_storage uinput snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfcomm fuse uhid bnep ccm algif_aead crypto_null des3_ede_x86_64 cbc des_generic libdes algif_skcipher cmac md4 algif_hash af_alg xfs vfat fat hid_logitech_hidpp hid_logitech_dj usbhid uvcvideo uvc videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc amdgpu joydev mousedev snd_hda_codec_realtek snd_hda_codec_generic drm_buddy snd_hda_codec_hdmi drm_exec gpu_sched btusb amdxcp crct10dif_pclmul btrtl drm_suballoc_helper crc32_pclmul rtw88_8822ce drm_ttm_helper crc32c_intel btintel rtw88_8822c ttm polyval_clmulni snd_hda_intel snd_pci_ps i2c_algo_bit rtw88_pci polyval_generic snd_intel_dspcfg snd_rpl_pci_acp6x gf128mul drm_display_helper bluetooth ghash_clmulni_intel snd_acp_pci sha512_ssse3 snd_acp_legacy_common snd_hda_codec
 hid_multitouch rtw88_core sha256_ssse3 r8169 cec snd_pci_acp6x sha1_ssse3 serio_raw snd_hwdep ecdh_generic mac80211 aesni_intel drm_kms_helper atkbd snd_hda_core snd_pci_acp5x realtek crypto_simd i2c_hid_acpi vivaldi_fmap libarc4 pkcs8_key_parser i2c_hid libps2 wmi_bmof cryptd cfg80211 snd_pcm mdio_devres drm sp5100_tco snd_timer snd_rn_pci_acp3x snd_acp_config ahci snd_soc_acpi libphy snd_pci_acp3x ideapad_laptop snd xhci_pci libahci platform_profile k10temp i2c_piix4 xhci_pci_renesas sparse_keymap rfkill i2c_designware_platform i8042 serio video tpm_crb ccp tpm_tis wmi tpm_tis_core rng_core tpm i2c_designware_core
CPU: 1 PID: 7310 Comm: bch-reclaim/sdb Not tainted 6.8.9-gentoo #1
Hardware name: LENOVO 82KD/LNVNB161216, BIOS GLCN50WW 07/04/2022
RIP: 0010:bch2_trans_srcu_unlock+0x126/0x130 [bcachefs]
Code: 80 97 b1 d1 48 c7 c7 55 6b be c1 48 29 c2 48 b8 cf f7 53 e3 a5 9b c4 20 48 c1 ea 03 c4 e2 cb f6 f0 48 c1 ee 04 e8 8a 2e 59 d0 <0f> 0b eb a0 00 00 00 00 00 00 66 0f 1f 00 53 48 c7 c0 40 a1 bb c1
RSP: 0018:ffffa42f42dc3b70 EFLAGS: 00010246
RAX: ac8defec8a9a4300 RBX: ffff8c5e4c204000 RCX: 0000000000000027
RDX: c0000000ffffdfff RSI: 0000000000000002 RDI: ffff8c5f3e65bb08
RBP: 0000000000000000 R08: 0000000000001fff R09: ffffffff9364aa70
R10: 0000000000005ffd R11: 0000000000000004 R12: ffff8c5e4c204000
R13: ffff8c5dfe4a6c40 R14: ffff8c5dfe480000 R15: ffff8c5dfe480000
FS:  0000000000000000(0000) GS:ffff8c5f3e640000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa15a7befe0 CR3: 0000000185610000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 ? __warn+0xca/0x1c0
 ? bch2_trans_srcu_unlock+0x126/0x130 [bcachefs]
 ? report_bug+0x16a/0x210
 ? handle_bug+0x3d/0x80
 ? exc_invalid_op+0x1a/0x50
 ? asm_exc_invalid_op+0x1a/0x20
 ? bch2_trans_srcu_unlock+0x126/0x130 [bcachefs]
 bch2_trans_begin+0x4aa/0x500 [bcachefs]
 bch2_btree_write_buffer_flush_locked+0xbd/0xb00 [bcachefs]
 ? __schedule+0x5a8/0x12e0
 ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
 ? schedule+0x6e/0xc0
 ? schedule_preempt_disabled+0x10/0x20
 ? __mutex_lock+0x1a8/0x650
 btree_write_buffer_flush_seq+0x249/0x2c0 [bcachefs]
 ? local_clock_noinstr+0x3a/0xb0
 ? local_clock+0x10/0x30
 bch2_btree_write_buffer_journal_flush+0x34/0x60 [bcachefs]
 journal_flush_pins+0x1f8/0x310 [bcachefs]
 ? bch2_journal_key_to_wb_slowpath+0x230/0x230 [bcachefs]
 __bch2_journal_reclaim+0x1f6/0x370 [bcachefs]
 bch2_journal_reclaim_thread+0x84/0x170 [bcachefs]
 ? bch2_journal_reclaim_start+0xe0/0xe0 [bcachefs]
 kthread+0xe3/0x100
 ? kthread_blkcg+0x30/0x30
 ret_from_fork+0x32/0x40
 ? kthread_blkcg+0x30/0x30
 ret_from_fork_asm+0x11/0x20
 </TASK>
---[ end trace 0000000000000000 ]---```

Gentoo, kernel 6.8.9
russells-crockpot commented 2 months ago

I have had the same issue as well:

------------[ cut here ]------------
btree trans held srcu lock (delaying memory reclaim) for 10 seconds
WARNING: CPU: 14 PID: 70131 at fs/bcachefs/btree_iter.c:2825 bch2_trans_put+0x250/0x280>
Modules linked in: udf crc_itu_t isofs cdrom bcachefs xor raid6_pq libcrc32c crc32c_gen>
 ip_tables x_tables f2fs crc32_generic lz4hc_compress lz4_compress hid_generic usbhid a>
CPU: 14 PID: 70131 Comm: mk-screens Tainted: G           OE      6.8.9-arch1-1 #1 b1154>
Hardware name: Gigabyte Technology Co., Ltd. B550 AORUS ELITE/B550 AORUS ELITE, BIOS F1>
RIP: 0010:bch2_trans_put+0x250/0x280 [bcachefs]
Code: 48 8b 15 d3 88 42 c7 48 c7 c7 e8 84 12 c2 48 b8 07 3a 6d a0 d3 06 3a 6d 48 29 ca >
RSP: 0018:ffffbe69b14d7658 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff958661450000 RCX: 0000000000000027
RDX: ffff958d5eba19c8 RSI: 0000000000000001 RDI: ffff958d5eba19c0
RBP: ffff95878b580000 R08: 0000000000000000 R09: ffffbe69b14d74e8
R10: ffffbe69b14d74e0 R11: 0000000000000003 R12: ffff958abf1cbfc0
R13: ffff958661450000 R14: ffff95878b5a63a8 R15: ffff95878b580000
FS:  000071db176006c0(0000) GS:ffff958d5eb80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000078c0b91ed7f0 CR3: 00000004ceb36000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 ? bch2_trans_put+0x250/0x280 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 ? __warn+0x81/0x130
 ? bch2_trans_put+0x250/0x280 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 ? report_bug+0x171/0x1a0
 ? prb_read_valid+0x1b/0x30
 ? srso_return_thunk+0x5/0x5f
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x17/0x70
 ? asm_exc_invalid_op+0x1a/0x20
 ? bch2_trans_put+0x250/0x280 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 ? bch2_trans_put+0x250/0x280 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 __bch2_create+0x5f8/0x6f0 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 ? srso_return_thunk+0x5/0x5f
 ? bch2_create+0x2a/0x60 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 bch2_create+0x2a/0x60 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
 path_openat+0xe9f/0x1190
 do_filp_open+0xb3/0x160
 do_sys_openat2+0xab/0xe0
 __x64_sys_openat+0x57/0xa0
 do_syscall_64+0x83/0x170
 ? srso_return_thunk+0x5/0x5f
 ? get_page_from_freelist+0x18e5/0x1a80
 ? srso_return_thunk+0x5/0x5f
 ? file_tty_write.isra.0+0x219/0x330
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 ? __alloc_pages+0x1a3/0x350
 ? alloc_pages_mpol+0x95/0x1f0
 ? srso_return_thunk+0x5/0x5f
 ? __pte_offset_map+0x1b/0x180
 ? srso_return_thunk+0x5/0x5f
 ? __mod_memcg_lruvec_state+0x97/0x110
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 ? set_ptes.isra.0+0x28/0x90
 ? srso_return_thunk+0x5/0x5f
 ? do_anonymous_page+0x357/0x6e0
 ? pmdp_collapse_flush+0x10/0x60
 ? srso_return_thunk+0x5/0x5f
 ? __handle_mm_fault+0xb4a/0xe40
 ? do_syscall_64+0x90/0x170
 ? srso_return_thunk+0x5/0x5f
 ? __count_memcg_events+0x4d/0xc0
 ? srso_return_thunk+0x5/0x5f
 ? count_memcg_events.constprop.0+0x1a/0x30
 ? srso_return_thunk+0x5/0x5f
 ? handle_mm_fault+0xa2/0x360
 ? srso_return_thunk+0x5/0x5f
 ? do_user_addr_fault+0x304/0x670
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 entry_SYSCALL_64_after_hwframe+0x78/0x80
RIP: 0033:0x71db3280e705
Code: 8b 55 d0 48 89 45 b0 75 a0 44 89 55 9c e8 93 8a f8 ff 44 8b 55 9c 89 da 4c 89 e6 >
RSP: 002b:000071db175fd3e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000080241 RCX: 000071db3280e705
RDX: 0000000000080241 RSI: 000071db175fd4d8 RDI: 00000000ffffff9c
RBP: 000071db175fd450 R08: 0000000000000000 R09: 010f0f010f0f0f07
R10: 00000000000001b6 R11: 0000000000000293 R12: 000071db175fd4d8
R13: 000071db175fd880 R14: 000071db175fd4d8 R15: 00000000000001b6
 </TASK>
---[ end trace 0000000000000000 ]---