Closed codebam closed 5 months ago
I can't do anything without logs.
How to get the logs, except capturing the boot screen's image? The system does not boot, so journalctl -b 1
isn't helpful.
Capture the boot image's screen, use a camera if you have to
Here is a video in HEVC format https://r2.seanbehan.ca/b8a1e
Relevant frame: https://r2.seanbehan.ca/8208d
Edit: in H264: https://r2.seanbehan.ca/a1953
Hi, I've tried to increase verbosity but with not much luck, here's my transcript:
unlocking successful.
mounting UUID=1bb5d54a-xxxx-xxxx-xxxx-xxxxxxxxxx on /...
(block of raid6 algorithm speeds)
[ 53.2xxxx] xor: automatically using best checksumming function avx
[ 53.3xxxx] bcachefs (UUID=1bb5d54a-xxxx-xxxx-xxxxxxxxxx): error reading superblock: (null)
ERROR - bcachefs::commands::cmd_mount: Fatal error: No such device or address (os error 6)
(errors about reading default superblock messages)
mount: mounting UUID=1bb5d54a-xxxx-xxxx-xxxx-xxxxxxxxxx on /mnt-root/ failed: No such file or directory
(nixos, flake update included new kernel 6.8.8->6.8.9 and bcachefs-tools 1.41->1.70; rootfs with enabled encryption)
I have the same or at least a similar problem with two of my systems, also from the nixpkgs upgrade of bcachefs-tools from 1.4.1 to 1.7.0. I also have encryption enabled but not compression. I think it might have something to do with the version of the superblock?
This is the superblock from one system with a single NVMe, the other system has two NVMes with 1 data replica.
However, I also had an added problem where I also could not boot anymore with bcachefs-tools 1.4.1 when I use any of the 6.9-rc kernels. Kernel 6.8.9 works fine still, but the 6.9 wouldn't mount with a message like this:
... error reading superblock: error opening UUID=...: ENOENT
That just magically fixed itself on one of the systems now after trying again to get the exact phrasing right on this error message with 6.9-rc7. Previously this had even caused an error to be recorded in the superblock, fs_usage_nr_inodes_wrong
.
The boot problem with bcachefs-tools 1.7.0 persists however, even though I am now using linux 6.9-rc7 and the superblock has been upgraded to bcachefs 1.7.
A weird thing is that I can not reproduce this anymore with a fresh installation where the rootfs was created with bcachefs-tools 1.4.1. So only the two systems I created around the middle of January have these issues.
Sorry if my description is a bit chaotic. :sweat_smile:
@chaosbiber @tmuehlbacher What are you using in your fstab? device node, UUID=? Also it sounds like mostly single block device file system, not multi-device right?
Update: Also what version of blkid are you using (check util-linux)
What are you using in your fstab? device node, UUID=?
UUID=1bb5d54a-cafa-4012-9831-12f31cb8a8fa / bcachefs x-initrd.mount,compression=lz4 0 0
generated from nixos-config
fileSystems."/" =
{ device = "UUID=...";
fsType = "bcachefs";
options = [ "compression=lz4" ];
};
Also it sounds like mostly single block device file system, not multi-device right?
correct
Update: Also what version of blkid are you using (check util-linux)
blkid from util-linux 2.39.3 (libblkid 2.39.3, 04-Dec-2023)
(did not change during the update)
This is my fstab line, generated by NixOS:
UUID=314767f3-de4e-4d10-8e38-6b0ba123313e /nix bcachefs x-initrd.mount,defaults,lazytime 0 0
blkid --version
:
blkid from util-linux 2.39.3 (libblkid 2.39.3, 04-Dec-2023)
Actually, one of my systems is multi-device. But both have the same problem. The super block info for the single-device fs is already in my previous comment. Here is the super block info for the multi-device fs:
Can you try 'bcachefs mount' with -v?
Here is the terminal output from a separate live system, using the same nixpkgs revisions as the actual systems.
In the first command it didn't ask for a password because the key was still in the kernel keyring from a prior mount.
It works here in a full system, just not in the initramfs, apparently.
So that looks like a keyring issue
I've created a nix installer based on nix-unstable so it should use most of the same package versions. Ran it on the system where booting fails. If I haven't misspelled my passphrase in all 6 or 7 tries, then yes, might look like that password doesn't reach its destination.
Edit: using kernel 6.8.9 with bcachefs-tools 1.7.0
Sorry for the dust, it's on my laptop, not your displays...
Deleted because I just reproduced the issue mentioned also in the nixos wiki.
@chaosbiber I can confirm I have the same issue. If you mount with bcachefs unlock -k session /dev/sda
it works.
@codebam Ah, right, I just reproduced an older issue for the 23.11 installer. But using the 1.7.0 again: with both the unlock -k session
and the keyctl link @u @s
plus unlock commands I can mount the volume, but it asks for the passphrase twice (unlock and mount)!
I can confirm hitting this issue trying to upgrade bcachefs-tools to 1.6.x a month ago on NixOS. bcachefs-tools have always worked perfectly fine for me when run interactively from a rescue USB. (NixOS skipped bcachefs-tools 1.6x entirely; so if anyone's looking for root causes they're probably further back in time.)
If I were to be suspicious of a particular commit here in bcachefs-tools
, it'd probably be: https://github.com/koverstreet/bcachefs-tools/commit/0a284fc4ffcbb46f0a4b921415ef12a9c75fa05c (and possibly, subsequent changes to mount.rs
)?
Could some kind of subtle "bug" have been introduced with the conversion of main to Rust? For example - something subtle like changing the return code when executed using the mount.bcachefs
symlink on a successful mount would be enough to break NixOS's stage1 script (I believe).
(NixOS skipped bcachefs-tools 1.6x entirely; so if anyone's looking for root causes they're probably further back in time.)
If you are interested can probably run git bisect
to find the actual commit that did this. This repo also provides a flake, so you can easily update to a intermediate version like 1.5 or 1.6 using a specific tag as flake input. Then overlay the package on top of system bcachefs-tools.
If I do run a bisect - do y'all consider it safe to use intermediate versions? Or should I restrict to tagged releases?
I'd recommend staying on the safe side and test tagged releases first, then we can move from there. @koverstreet can confirm better though if it's safe to use intermediate commits.
Alright; ran a bisect on my machine (restricting to tagged releases). The last good release was v1.6.3 and the first bad release is v1.6.4.
git log --bisect --oneline --decorate --graph
:
I also added a little extra logging in stage-1-init.sh; and can tell you a little bit more precisely about what regressed:
Calling mount "/mnt-root/persist"
returns a failure status code (haven't yet been able to confirm if the mounted fs was actually present afterwards), and
/etc/fstab
contains the line UUID=<insert bcachefs uuid> /mnt-root/persist bcachefs verbose,fix_errors
(no doubt with different white space, I had to copy from a photo).If it's relevant - I'm not running compression but am running encryption; and have a multi-device setup.
@marcin-github would it be possible for you to test if you can reproduce this, on a NixOS machine? Not sure if it would be relevant in Fedora or Debian.
@marcin-github would it be possible for you to test if you can reproduce this, on a NixOS machine? Not sure if it would be relevant in Fedora or Debian.
Meseems you want to mention someone else :)
Having the same issue. NixOS with Linux 6.8.9, bcachefs-tools 1.7.0, configuration:
fileSystems."/" = {
device = "UUID=5f910790-3f93-4e9e-baf4-13b69719dc6a";
fsType = "bcachefs";
options = [
"compression=lz4"
"fix_errors=yes"
"nojournal_transaction_names"
"relatime"
"discard"
"background_compression=lz4"
];
};
After typing my password on boot, getting an error:
bcachefs (UUID=5f910790-3f93-4e9e-baf4-13b69719dc6a): error reading superblock: (null)
bcachefs::commands::cmd_mount: Fatal error: No such device or address (os error 6)
Edit: using encryption together with compression.
@marcin-github would it be possible for you to test if you can reproduce this, on a NixOS machine? Not sure if it would be relevant in Fedora or Debian.
Meseems you want to mention someone else :)
Well you seem to be quite active in this repo testing bcachefs
along with @tasleson. This needs an urgent patch, as v1.7.0 is supposed to ship with the upcoming NixOS 24.05 release at the end of May.
Well, I'm not using nixos, I'm not using bcachefs as root fs and I'm not using enryption. What I can say is that -tools on latest commits works on my host. I'm not sure how can I help you and what do you expect from me :(
Got confirmation from @koverstreet on IRC that he thinks it should be reasonably safe to bisect further between v1.6.3 and v1.6.4; will hopefully have time to do so later tonight.
@marcin-github: Sadly, the tools mostly work fine for me as well - just not in initramfs.
@JohnRTitor: What's the precise reason for wanting to ship bcachefs-tools-1.7.0 with NixOS 24.05? Is 24.05 shipping with a 6.9 kernel? If not - I'd actually suggest that bcachefs-tools-1.6.x (specifically 1.6.3 because I believe it works) would be more appropriate because it matches the disk format in the 6.8 kernel?
6.9 kernel release is due in a week, and 24.05 NixOS release will definitely ship with it (release schedule). The ISO's, by default, however will stick to 6.6 LTS releases, so this issue will not affect most of the users, except those who are actively using bcachefs
filesystem and choose to upgrade their system.
6.9 kernel uses bcachefs
version 1.7.0, so obviously it would have been better to ship bcachefs-tools
1.7.0 with it. The kernel can automatically downgrade bcachefs version to better match the system though, so I am not too worried about a version mismatch. As things stand, I'll probably downgrade the package to 1.6.3
.
I tried building and booting some of the commits between v1.6.3 and v.1.6.4
So it looks like the issue is with 86049a1641535f451fdd5a8bf885ecb925adbf1e ?
That diff doesn't have any obvious issues in it, unless it's the reordering of this check from before the password prompt to after it and inside of the decryption logic? Won't be able to try myself for at least several hours - but would be curious if moving it back to the start of ask_for_key
resolves things...
@reedriley Yes actually, moving it back does in fact boot 12164c97df4c936d137d556ea6a49d809163f66f
Yeah, I believe it makes a lot of sense.
Not sure about rpassword but stdin.read_line()
will definitely block forever if nothing comes in on stdin
. So we never reach the point of being able to check the kernel keyring with check_for_key()
After rebasing 12164c97df4c936d137d556ea6a49d809163f66f onto 477670f48167cac1b871b061713cc1b594a2a941 (https://github.com/codebam/bcachefs-tools/commits/try-fix/ , https://github.com/codebam/bcachefs-tools/commits/fix-nixos-stage1) The password prompt doesn't read in passwords properly. It prompts, but pressing <CR>
twice puts you on a new line.
Edit: tried with @tmuehlbacher's commit as well and no dice, at least not on master
I didn't really test that commit yet. I will have to look into how to best test this (i.e. is master currently safe to use or do I have to rebase onto 1.7.0 to test). If someone would be willing to try, that would also be cool but no pressure. 🙂
@codebam thanks, so still some problems on master? I tested it by branching off from v1.7.0 and cherry-picking the commit from my PR and that boots now for me on kernel 6.9-rc7
@codebam thanks, so still some problems on master? I tested it by branching off from v1.7.0 and cherry-picking the commit from my PR and that boots now for me on kernel 6.9-rc7
Yeah but it's a different problem on master where you can't enter the passphrase. I'm on this commit with your commit and it boots fine 5531accc97da082b7a102240e34fdf15c68a8991 also 6.9-rc7
I pushed another commit to #263 that makes mounting work on master now. :)
Please test the latest changes so that it is fixed in upstream. NixOS users can easily do so like https://github.com/JohnRTitor/nix-conf/commit/b60df8a18feb8c9e6e4edc16fb62fe2a5ad0449b
Tested and working on rc7
Hey I'm using NixOS, and I can't boot after updating to 1.7.0.
I use both encryption and compression.
This is my configuration: https://github.com/codebam/nixos (I've pinned 1.4.0 for the time being).
What I've tried is here: https://github.com/NixOS/nixpkgs/issues/309388