koverstreet / bcachefs

Other
643 stars 71 forks source link

WARNING on attempt to unmount due to in-progress journal reclaim #645

Closed hhoffstaette closed 6 months ago

hhoffstaette commented 6 months ago

The journal reclaim symptom described in #639 has another downside: it seems to prevent unmount when reclaim is still running. In my case I didn't notice the background activity and when trying to unmount my test HDD got the following (on 6.7.2 + REQ_OP_FLUSH patch from 6.8 since it's an external USB drive):

Jan 28 20:19:39 tux kernel: ------------[ cut here ]------------
Jan 28 20:19:39 tux kernel: btree trans held srcu lock (delaying memory reclaim) for 10 seconds
Jan 28 20:19:39 tux kernel: WARNING: CPU: 2 PID: 15073 at fs/bcachefs/btree_iter.c:2851 bch2_trans_put+0x1ea/0x210 [bcachefs]
Jan 28 20:19:39 tux kernel: Modules linked in: bcachefs lz4hc_compress crc64 nfsd auth_rpcgss oid_registry lockd grace sunrpc sch_fq_codel nct6775 nct6775_core hwmon_vid i915 btrfs x86_pkg_temp_thermal coretemp crc32_pclmul sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic blake2b_generic libsha256 xor lzo_compress sha1_ssse3 drivetemp lzo_decompress wmi_bmof i2c_algo_bit iosf_mbi drm_buddy raid6_pq sha1_generic intel_gtt zstd_compress aesni_intel drm_display_helper libaes crypto_simd drm_kms_helper cryptd bfq ttm mq_deadline drm i2c_i801 i2c_smbus usbhid drm_panel_orientation_quirks i2c_core atlantic video backlight wmi [last unloaded: kheaders]
Jan 28 20:19:39 tux kernel: CPU: 2 PID: 15073 Comm: kworker/2:2 Not tainted 6.7.2 #1
Jan 28 20:19:39 tux kernel: Hardware name: System manufacturer System Product Name/P8Z68-V LX, BIOS 4105 07/01/2013
Jan 28 20:19:39 tux kernel: Workqueue: bcachefs_btree_io btree_node_write_work [bcachefs]
Jan 28 20:19:39 tux kernel: RIP: 0010:bch2_trans_put+0x1ea/0x210 [bcachefs]
Jan 28 20:19:39 tux kernel: Code: 48 8b 15 19 7f 7b e1 48 c7 c7 e0 0a d5 a0 48 b8 07 3a 6d a0 d3 06 3a 6d 48 29 ca 48 f7 e2 48 89 d6 48 c1 ee 07 e8 96 92 41 e0 <0f> 0b 8b b5 90 00 00 00 49 8d be c8 34 00 00 83 fe 01 77 0a e8 ed
Jan 28 20:19:39 tux kernel: RSP: 0018:ffffc90003c6bde0 EFLAGS: 00010286
Jan 28 20:19:39 tux kernel: RAX: 0000000000000043 RBX: ffff8883643ee600 RCX: 0000000000000027
Jan 28 20:19:39 tux kernel: RDX: ffff8887ff71b448 RSI: 0000000000000001 RDI: ffff8887ff71b440
Jan 28 20:19:39 tux kernel: RBP: ffff888391350000 R08: 00000000ffffbfff R09: 0000000000000001
Jan 28 20:19:39 tux kernel: R10: 0000000000000000 R11: ffff8887ff4a0000 R12: 0000000000000011
Jan 28 20:19:39 tux kernel: R13: ffff888107aeb090 R14: ffff888107ac0000 R15: ffff8883755cbdc0
Jan 28 20:19:39 tux kernel: FS:  0000000000000000(0000) GS:ffff8887ff700000(0000) knlGS:0000000000000000
Jan 28 20:19:39 tux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 28 20:19:39 tux kernel: CR2: 000056490f6fa668 CR3: 000000000242a002 CR4: 00000000000606f0
Jan 28 20:19:39 tux kernel: Call Trace:
Jan 28 20:19:39 tux kernel:  <TASK>
Jan 28 20:19:39 tux kernel:  ? bch2_trans_put+0x1ea/0x210 [bcachefs]
Jan 28 20:19:39 tux kernel:  ? __warn+0x7d/0x120
Jan 28 20:19:39 tux kernel:  ? bch2_trans_put+0x1ea/0x210 [bcachefs]
Jan 28 20:19:39 tux kernel: ------------[ cut here ]------------
Jan 28 20:19:39 tux kernel: btree trans held srcu lock (delaying memory reclaim) for 10 seconds
Jan 28 20:19:39 tux kernel:  ? report_bug+0x155/0x180
Jan 28 20:19:39 tux kernel:  ? handle_bug+0x36/0x70
Jan 28 20:19:39 tux kernel:  ? exc_invalid_op+0x13/0x60
Jan 28 20:19:39 tux kernel:  ? asm_exc_invalid_op+0x16/0x20
Jan 28 20:19:39 tux kernel: WARNING: CPU: 1 PID: 16009 at fs/bcachefs/btree_iter.c:2851 bch2_trans_srcu_unlock+0x101/0x110 [bcachefs]
Jan 28 20:19:39 tux kernel:  ? bch2_trans_put+0x1ea/0x210 [bcachefs]
Jan 28 20:19:39 tux kernel: Modules linked in: bcachefs lz4hc_compress crc64 nfsd auth_rpcgss oid_registry lockd grace sunrpc sch_fq_codel nct6775 nct6775_core hwmon_vid i915 btrfs x86_pkg_temp_thermal coretemp crc32_pclmul sha512_ssse3 sha512_generic sha256_ssse3 sha256_generic blake2b_generic libsha256 xor lzo_compress sha1_ssse3 drivetemp lzo_decompress wmi_bmof i2c_algo_bit iosf_mbi drm_buddy raid6_pq sha1_generic intel_gtt zstd_compress aesni_intel drm_display_helper libaes crypto_simd drm_kms_helper cryptd bfq ttm mq_deadline drm i2c_i801 i2c_smbus usbhid drm_panel_orientation_quirks
Jan 28 20:19:39 tux kernel:  btree_node_write_work+0x134/0x420 [bcachefs]
Jan 28 20:19:39 tux kernel:  i2c_core atlantic video backlight wmi [last unloaded: kheaders]
Jan 28 20:19:39 tux kernel: CPU: 1 PID: 16009 Comm: kworker/1:0 Not tainted 6.7.2 #1
Jan 28 20:19:39 tux kernel: Hardware name: System manufacturer System Product Name/P8Z68-V LX, BIOS 4105 07/01/2013
Jan 28 20:19:39 tux kernel:  process_one_work+0x131/0x2b0
Jan 28 20:19:39 tux kernel:  worker_thread+0x2e5/0x400
Jan 28 20:19:39 tux kernel: Workqueue: bcachefs_btree_io btree_node_write_work [bcachefs]
Jan 28 20:19:39 tux kernel:  ? flush_work+0x10/0x10
Jan 28 20:19:39 tux kernel:  kthread+0xc9/0xf0
Jan 28 20:19:39 tux kernel:  ? kthread_complete_and_exit+0x20/0x20
Jan 28 20:19:39 tux kernel:  ret_from_fork+0x2d/0x50
Jan 28 20:19:39 tux kernel:  ? kthread_complete_and_exit+0x20/0x20
Jan 28 20:19:39 tux kernel:  ret_from_fork_asm+0x11/0x20
Jan 28 20:19:39 tux kernel:  </TASK>
Jan 28 20:19:39 tux kernel: ---[ end trace 0000000000000000 ]---
Jan 28 20:19:39 tux kernel: 
Jan 28 20:19:39 tux kernel: RIP: 0010:bch2_trans_srcu_unlock+0x101/0x110 [bcachefs]
Jan 28 20:19:39 tux kernel: Code: 48 8b 15 32 8b 7b e1 48 c7 c7 e0 0a d5 a0 48 b8 07 3a 6d a0 d3 06 3a 6d 48 29 ca 48 f7 e2 48 c1 ea 07 48 89 d6 e8 af 9e 41 e0 <0f> 0b eb 94 0f 0b eb a2 0f 1f 80 00 00 00 00 0f 1f 44 00 00 f6 87
Jan 28 20:19:39 tux kernel: RSP: 0018:ffffc9000930fd80 EFLAGS: 00010282
Jan 28 20:19:39 tux kernel: RAX: 0000000000000043 RBX: ffff8887e0914000 RCX: 0000000000000027
Jan 28 20:19:39 tux kernel: RDX: ffff8887ff69b448 RSI: 0000000000000001 RDI: ffff8887ff69b440
Jan 28 20:19:39 tux kernel: RBP: ffff888107ac0000 R08: 00000000ffffbfff R09: 0000000000000001
Jan 28 20:19:39 tux kernel: R10: 0000000000000000 R11: ffff8887ff4a0000 R12: ffff8887e09141b8
Jan 28 20:19:39 tux kernel: R13: ffff8887e0916330 R14: ffff8887e0914000 R15: ffff8883755cbc00
Jan 28 20:19:39 tux kernel: FS:  0000000000000000(0000) GS:ffff8887ff680000(0000) knlGS:0000000000000000
Jan 28 20:19:39 tux kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 28 20:19:39 tux kernel: CR2: 000055f78a96d000 CR3: 000000000242a001 CR4: 00000000000606f0
Jan 28 20:19:39 tux kernel: Call Trace:
Jan 28 20:19:39 tux kernel:  <TASK>
Jan 28 20:19:39 tux kernel:  ? bch2_trans_srcu_unlock+0x101/0x110 [bcachefs]
Jan 28 20:19:39 tux kernel:  ? __warn+0x7d/0x120
Jan 28 20:19:39 tux kernel:  ? bch2_trans_srcu_unlock+0x101/0x110 [bcachefs]
Jan 28 20:19:39 tux kernel:  ? report_bug+0x155/0x180
Jan 28 20:19:39 tux kernel:  ? handle_bug+0x36/0x70
Jan 28 20:19:39 tux kernel:  ? exc_invalid_op+0x13/0x60
Jan 28 20:19:39 tux kernel:  ? asm_exc_invalid_op+0x16/0x20
Jan 28 20:19:39 tux kernel:  ? bch2_trans_srcu_unlock+0x101/0x110 [bcachefs]
Jan 28 20:19:39 tux kernel:  ? bch2_trans_begin+0x12f/0x5b0 [bcachefs]
Jan 28 20:19:39 tux kernel:  bch2_trans_begin+0x55b/0x5b0 [bcachefs]
Jan 28 20:19:39 tux kernel:  btree_node_write_work+0xdd/0x420 [bcachefs]
Jan 28 20:19:39 tux kernel:  process_one_work+0x131/0x2b0
Jan 28 20:19:39 tux kernel:  worker_thread+0x2e5/0x400
Jan 28 20:19:39 tux kernel:  ? flush_work+0x10/0x10
Jan 28 20:19:39 tux kernel:  kthread+0xc9/0xf0
Jan 28 20:19:39 tux kernel:  ? kthread_complete_and_exit+0x20/0x20
Jan 28 20:19:39 tux kernel:  ret_from_fork+0x2d/0x50
Jan 28 20:19:39 tux kernel:  ? kthread_complete_and_exit+0x20/0x20
Jan 28 20:19:39 tux kernel:  ret_from_fork_asm+0x11/0x20
Jan 28 20:19:39 tux kernel:  </TASK>
Jan 28 20:19:39 tux kernel: ---[ end trace 0000000000000000 ]---

Should unmount notify the reclaim thread to exit?

hhoffstaette commented 6 months ago

Argh - sorry for the duplicate, this seems to be a known problem (#605, #636).