koverstreet / bcachefs

Other
675 stars 69 forks source link

Installing OS to a VM on a bcachefs raid1 array leads to hang #661

Closed reivanen closed 6 months ago

reivanen commented 6 months ago

I have been trying to install ubuntu 22.04 to a linux 6.8 kvm qcow2 file (nocow) on a bcachefs relicas=2 filesystem, but it is freezing mid install (happened 2 times)

The host journal log show this kind of messages:

17:19:13 kernel: ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 kernel: bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 kernel: bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 kernel: ? __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 kernel: bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 kernel: ? __bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 kernel: ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 kernel: bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 kernel: bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18]

And when i try to kill -9 the stuck virtual machine process nothing happens. It seems it has fucked up the kernel so bad it is unable to kill the process.

reivanen commented 6 months ago

not only did the install lead to a hang but it seems probable also a couple previous VM crashes are related to this also.

Sad that no engagement has happened.

Best way to have no problems is to ignore all problems.

koverstreet commented 6 months ago

It's only been 2 days and I've been busy with helping a user recover from actual data loss.

I'll get to this, be patient.

reivanen commented 6 months ago

Thank you for acknowledging, but there is probably little more to be had regarding this because i had to reformat to ext4 to keep things going and no important data was at issue. Gotta love modern world where 1 TB can just be considered ephemeral ;)

reivanen commented 6 months ago

here is a full log of the last boot i have access to, hopefully it can help:

13:49:40 -pc kernel: bcachefs (30a75aa0-2fdd-435c-9b90-5f53275a4f2e): going read-write 17:19:13 -pc kernel:  ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 -pc kernel:  bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 -pc kernel:  bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 -pc kernel:  ? __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 -pc kernel:  bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 -pc kernel:  ? bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 -pc kernel:  ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 -pc kernel:  bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:19:13 -pc kernel:  bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:21:16 -pc kernel:  ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:21:16 -pc kernel:  bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:21:16 -pc kernel:  bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:21:16 -pc kernel:  ? bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:21:16 -pc kernel:  __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:21:16 -pc kernel:  ? bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:21:16 -pc kernel:  ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:21:16 -pc kernel:  bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:21:16 -pc kernel:  bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:23:19 -pc kernel:  ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:23:19 -pc kernel:  bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:23:19 -pc kernel:  bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:23:19 -pc kernel:  ? __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:23:19 -pc kernel:  bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:23:19 -pc kernel:  ? bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:23:19 -pc kernel:  ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:23:19 -pc kernel:  bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:23:19 -pc kernel:  bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:25:22 -pc kernel:  ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:25:22 -pc kernel:  bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:25:22 -pc kernel:  bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:25:22 -pc kernel:  ? bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:25:22 -pc kernel:  __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:25:22 -pc kernel:  ? bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:25:22 -pc kernel:  ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:25:22 -pc kernel:  bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:25:22 -pc kernel:  bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:27:25 -pc kernel:  ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:27:25 -pc kernel:  bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:27:25 -pc kernel:  bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:27:25 -pc kernel:  ? __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:27:25 -pc kernel:  bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:27:25 -pc kernel:  ? bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:27:25 -pc kernel:  ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:27:25 -pc kernel:  bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:27:25 -pc kernel:  bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:29:28 -pc kernel:  ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:29:28 -pc kernel:  bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:29:28 -pc kernel:  bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:29:28 -pc kernel:  ? bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:29:28 -pc kernel:  __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:29:28 -pc kernel:  ? bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:29:28 -pc kernel:  ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:29:28 -pc kernel:  bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:29:28 -pc kernel:  bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:31:31 -pc kernel:  ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:31:31 -pc kernel:  bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:31:31 -pc kernel:  bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:31:31 -pc kernel:  ? __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:31:31 -pc kernel:  bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:31:31 -pc kernel:  ? bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:31:31 -pc kernel:  ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:31:31 -pc kernel:  bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:31:31 -pc kernel:  bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:33:34 -pc kernel:  ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:33:34 -pc kernel:  bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:33:34 -pc kernel:  bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:33:34 -pc kernel:  ? bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:33:34 -pc kernel:  __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:33:34 -pc kernel:  ? bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:33:34 -pc kernel:  ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:33:34 -pc kernel:  bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:33:34 -pc kernel:  bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:35:36 -pc kernel:  ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:35:36 -pc kernel:  bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:35:36 -pc kernel:  bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:35:36 -pc kernel:  ? __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:35:36 -pc kernel:  bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:35:36 -pc kernel:  ? bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:35:36 -pc kernel:  ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:35:36 -pc kernel:  bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:35:36 -pc kernel:  bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:37:39 -pc kernel:  ? bch2_btree_iter_peek_slot+0x18b/0x740 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:37:39 -pc kernel:  bch2_bucket_nocow_lock+0xb7/0x120 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:37:39 -pc kernel:  bch2_nocow_write+0x8c3/0x11c0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:37:39 -pc kernel:  ? bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:37:39 -pc kernel:  __bch2_write+0xdaf/0x13b0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:37:39 -pc kernel:  ? bch2_increment_clock+0x2d/0x140 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:37:39 -pc kernel:  ? bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:37:39 -pc kernel:  bch2_direct_write+0x6f9/0xce0 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18] 17:37:39 -pc kernel:  bch2_write_iter+0x142/0xc70 [bcachefs f393ee8538d649baab95f28547fea17a03dfef18]

reivanen commented 6 months ago

then the VM hanged and system was unrecoverable, even when trying to shut the host down it hang indefinitely at trying to kill vm process

reivanen commented 6 months ago

notice the error messages printed are not started by bcachefs: and i only found out about them because they contain bcachefs before the uuid. Should not all errors emitted by bcachefs start with bcachefs: ?

also, it's confusing why the first message and the rest have different uuid:s. I can only assume this is what i have read about internal and external uuid:s....

colttt commented 6 months ago

can you please post how you create you bcachefs and also the superblock

reivanen commented 6 months ago

i cannot post the superblock obviously as it does not exist any more. How i created the filesystem? As every guide tells you. Aside from label to both disks i only specified --replicas=2 ... and then set attribute --nocow on the virtual image file.

andy-amii commented 6 months ago

I suspect this and #662 might be closely related.

koverstreet commented 6 months ago

Sorry for the long delay in getting to this; I'm going to close this since you said you reformatted, but please reopen if you're still able to hit this and willing to test a fix.

The first thing I'll need is a better backtrace - I'm not sure what that is from your kernel log, but it doesn't look like a legit backtrace. Make sure you're compiling your kernel with frame pointers to get that.