nix-community / disko

Declarative disk partitioning and formatting using nix [maintainers=@Lassulus @Enzime @iFreilicht]
MIT License
1.88k stars 201 forks source link

issue with eMMC #844

Open cowboyai opened 1 month ago

cowboyai commented 1 month ago

I have eMMC Booting, as in it boots and starts Stage 1, but fails to mount /dev/disk/by-partlabel/disk-sysboot-root when targetting /dev/mmcblk0

if I target /dev/sda and put it on a usb, it works fine and boots correctly.

I have verified I am booting the correct device because I pull out the usb and it has to be booting from emmc. I am deploying via nixos-anywhere with a disko configuration.

Here is my config:

{
  # This file is ONLY the system drive
  # NAS Settings are in disko-nas.nix

  # eMMC boots here, not the boot0/1 partitions, but we do need initial mbr offset for it to boot correctly, otherwise /boot doesn't mount either
   disko.devices = {
    disk = {
      # the bootable drive...
      sysboot = {
        # this is the emmc device
        device = "/dev/mmcblk0";
        type = "disk";
        content = {
          type = "gpt";
          partitions = {
            MBR = {
              type = "EF02"; # for MBR
              size = "1M";
              priority = 1; # Needs to be first partition
            };
            ESP = {
              type = "EF00";
              size = "500M";
              content = {
                type = "filesystem";
                format = "vfat";
                mountOptions = [ "umask=0077" ]; # make it NOT world readable
                mountpoint = "/boot";
              };
            };

            root = {
              size = "100%";
              content = {
                type = "filesystem";
                format = "ext4";
                mountpoint = "/";
              };
            };
          };
        };
      };
    };
  };
}

full configs are available at: https://github.com/TheCowboyAI/nixos-flashstor

Am I not setting something correctly for eMMC? as I understand, getting the offset right to initiate boot was the only difference.

iFreilicht commented 1 month ago

What's the error message?

cowboyai commented 1 month ago

waiting for device /dev/disk/by-partlabel/disk-sysboot-root to appear......... Timed out waiting for device /dev/disk/by-partlabel/disk-sysboot-root trying anyway.... mounting /dev/disk/by-partlabel/disk-sysboot-root on / ... /dev/disk/by-partlabel/disk-sysboot-root: can't lookup block device

cowboyai commented 1 month ago

is this helpful?

from writing to the eMMC with nixos-anywhere output:

/dev/mmcblk0: 8 bytes were erased at offset 0x00000200 (gpt): 45 46 49 20 50 41 52 54
/dev/mmcblk0: 8 bytes were erased at offset 0x1d1fffe00 (gpt): 45 46 49 20 50 41 52 54
/dev/mmcblk0: 2 bytes were erased at offset 0x000001fe (PMBR): 55 aa
+ dd if=/dev/zero of=/dev/mmcblk0 bs=440 count=1
Scudo ERROR: invalid alignment requested in aligned_alloc: 4096, alignment must be a power of two and the requested size 0x1b8 must be a multiple of alignment
++ realpath /dev/nvme0n1
+ disk=/dev/nvme0n1
+ lsblk -a -f
cowboyai commented 1 month ago

full log, these are the only errors I see are alignment errors deploy-emmc-20241022-152917.txt

iFreilicht commented 1 month ago

So the installation logs look good. These alignment errors are a little suspicious, not sure where the weird blocksize of 440 bytes is coming from, but probably not the root of the issue.

I'm also suspicious of the size of the MBR partition, maybe the small size causes the alignment issues?

Does Linux offer you to enter a rescue prompt after failing to boot? If so, can you enter it and check the presence of the eMMC device and its by-partlabel symlinks?

I'm also wondering if you maybe have to add additional drivers at boot for eMMC.

cowboyai commented 1 month ago

I have "mmc-block" in kernelmodules... probably an overloaded set actually...

If I boot from USB, I can fully access the system including turning write mode on and off for the eMMC.

I can see the labels applied, but I cannot continue when booting from eMMC, only reboot or continue, continue just fails again with can't find /mnt-root/ in /proc/mounts.

booting from usb diags:

root@minio:~/ > lsblk -f
NAME    FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                            
├─sda1  vfat   FAT16       607B-BC2B                             467.5M     6% /boot
└─sda2  ext4   1.0         f84d47cc-742d-48ea-bc46-5f49b6f016d7   49.6G     6% /nix/store
                                                                               /
mmcblk0                                                                        
├─mmcblk0p1
│                                                                              
├─mmcblk0p2
│       vfat   FAT32       97F3-1B49                                           
└─mmcblk0p3
        ext4   1.0         cc740e21-1191-4731-b100-5d6e987cc9ac                
mmcblk0boot0

mmcblk0boot1

zram0                                                                          [SWAP]
nvme3n1                                                                        
└─nvme3n1p1
        zfs_me 5000  zroot 6066253292192136017                                 
nvme1n1                                                                        
└─nvme1n1p1
        zfs_me 5000  zroot 6066253292192136017                                 
nvme0n1                                                                        
└─nvme0n1p1
        zfs_me 5000  zroot 6066253292192136017                                 
nvme2n1                                                                        
└─nvme2n1p1
        zfs_me 5000  zroot 6066253292192136017                                 
nvme4n1                                                                        
└─nvme4n1p1
        zfs_me 5000  zroot 6066253292192136017                                 
nvme6n1                                                                        
└─nvme6n1p1
        zfs_me 5000  zroot 6066253292192136017                                 
nvme7n1                                                                        
└─nvme7n1p1
        zfs_me 5000  zroot 6066253292192136017                                 
nvme5n1                                                                        
└─nvme5n1p1
        zfs_me 5000  zroot 6066253292192136017      
 root@minio:~/ > ls -l /dev/disk/by-partlabel 
total 0
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-emmcboot-ESP -> ../../mmcblk0p2
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-emmcboot-MBR -> ../../mmcblk0p1
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-emmcboot-root -> ../../mmcblk0p3
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-nvme0n1-zfs -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-nvme1n1-zfs -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-nvme2n1-zfs -> ../../nvme2n1p1
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-nvme3n1-zfs -> ../../nvme3n1p1
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-nvme4n1-zfs -> ../../nvme4n1p1
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-nvme5n1-zfs -> ../../nvme5n1p1
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-nvme6n1-zfs -> ../../nvme6n1p1
lrwxrwxrwx 1 root root 15 Oct 23 16:30 disk-nvme7n1-zfs -> ../../nvme7n1p1
lrwxrwxrwx 1 root root 10 Oct 23 16:30 disk-sysboot-ESP -> ../../sda1
lrwxrwxrwx 1 root root 10 Oct 23 16:30 disk-sysboot-root -> ../../sda2
root@minio:~/ > 

the part-labels appear to be set correctly

cowboyai commented 1 month ago

I have removed the MBR partition and I am still able to boot from eMMC, but it stalls at the same place mounting root. I have ensured that every partlabel is unique. since disko is controlling all this, I can't affect much else. since the same configuration boots from /dev/sda or /dev/nvme0n1 if i target them, This is definitely unique to eMMC.

The device itself is working and passing all tests, There is just something wrong mounting /. booting seems to working correctly and we get to device mapper and LVM, then it dies mounting /. the partlabel on the disk is correct and unique.

I tried inspecting /var/log/journal of the emmc, but so far it's not revealing anything.

iFreilicht commented 1 month ago

The by-partlabel output contains this line:

lrwxrwxrwx 1 root root 10 Oct 23 16:30 disk-sysboot-root -> ../../sda2

I assume sda is your USB drive, so that makes sense.

I can also see in your config that you named the main disk sysboot in the config you can boot from usb, but emmcboot in the config you boot from emmc.

This makes sense, but it seems fstab isn't updated before the reboot. Are you doing a nixos-rebuild switch or using nixos-anywhere to switch to the nixos-flashstor-emmc config before rebooting? Or are you installing the

Also, the by-partlabel output makes it seem like you're somehow including both disko-sysboot.nix and disko-emmcboot.nix at the same time, but maybe that was just for testing?

cowboyai commented 1 month ago

yes, sda is the usb boot device

the reason I made them different is so the partlabels would be unique while troubleshooting. I have previously had merge issues where non-unique partlabels where loading twice and overriding the first setting, but I think that has been fixed. after nixos-anywhere install and failed boot, I reboot with usb... I can mount the emmc to /mnt/mmc

inspecting fstab yeilds:

root@minio:/mnt/mmc/ > ls -la etc/static
lrwxrwxrwx 1 root root 51 Oct 23 19:15 etc/static -> /nix/store/hx2kr2yhmsbis00y1fwnjjk82jwas7yk-etc/etc
root@minio:/mnt/mmc/ > ls -la /mnt/mmc/nix/store/hx2kr2yhmsbis00y1fwnjjk82jwas7yk-etc/etc/fstab
lrwxrwxrwx 1 root root 53 Jan  1  1970 /mnt/mmc/nix/store/hx2kr2yhmsbis00y1fwnjjk82jwas7yk-etc/etc/fstab -> /nix/store/a7pjg8xbdqb15r0x703408wmrmr2xjyw-etc-fstab
root@minio:/mnt/mmc/ > cat /mnt/mmc/nix/store/a7pjg8xbdqb15r0x703408wmrmr2xjyw-etc-fstab
# This is a generated file.  Do not edit!
#
# To make changes, edit the fileSystems and swapDevices NixOS options
# in your /etc/nixos/configuration.nix file.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>

# Filesystems.
/dev/disk/by-partlabel/disk-emmcboot-root / ext4 x-initrd.mount,defaults 0 1
/dev/disk/by-partlabel/disk-emmcboot-ESP /boot vfat umask=0077 0 2
zroot/zfs_fs /zfs_fs zfs defaults,zfsutil 0 0
zroot /zroot zfs defaults,zfsutil 0 0

# Swap devices.

root@minio:/mnt/mmc/ > 

so fstab looks right, but is somehow failing.

I can't read /mnt/mmc/dev obviously because it's dynamic, but lsblk does show it and I can mount it, so the partition is valid.

root@minio:/mnt/mmc/ > lsblk -f
NAME         FSTYPE     FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                     
├─sda1       vfat       FAT16       607B-BC2B                             467.5M     6% /boot
└─sda2       ext4       1.0         f84d47cc-742d-48ea-bc46-5f49b6f016d7   49.5G     6% /nix/store
                                                                                        /
mmcblk0                                                                                 
├─mmcblk0p1  vfat       FAT32       A9C6-BD1C                                           
└─mmcblk0p2  ext4       1.0         60ea0b94-381b-4f24-be17-27fe70ce5d5b    2.9G    50% /mnt/mmc
mmcblk0boot0                                                                            
mmcblk0boot1                                                                            
zram0                                                                                   [SWAP]
iFreilicht commented 1 month ago

Yeah so that fstab looks good indeed, but somehow during boot the old one is still used, as the message is waiting for /dev/disk/by-partlabel/disk-sysboot-root to appear, right? So are you sure you're booting the correct generation of your system?

And what exact command are you using to install this configuration?

cowboyai commented 1 month ago

no, when booting from emmc under this config, it says: Waiting for device /dev/disk/by-partlabel/disk-emmcboot-root to appear................

cowboyai commented 1 month ago

I am installing with this command:

nix run "github:nix-community/nixos-anywhere" -- --flake .#nixos-flashstor-emmc root@172.16.0.2 2>&1 | tee logs/deploy-emmc-$(date +%Y%m%d-%H%M%S).txt

from the flake root.

which is invoking this:

      # for nixos-anywhere to the emmc, emmc has quirks
      nixos-flashstor-emmc = nixpkgs.lib.nixosSystem {
        system = "x86_64-linux";
        specialArgs = { inherit self; };
        modules = [
          disko.nixosModules.disko 
          ./modules/disko-emmcboot.nix
          ./modules/disko-nas.nix
          ./configuration.nix
        ];
      };
cowboyai commented 1 month ago

image

iFreilicht commented 3 weeks ago

I have "mmc-block" in kernelmodules...

So there is another option: https://github.com/TheCowboyAI/nixos-flashstor/blob/b9e626adee71739df18d25a653d034327099e950/modules/hardware-configuration.nix#L16

Could you try adding "mmc_block" there?

The "Timed out waiting for device /dev/disk/by-partlabel/disk-emmcboot-root, trying to mount anyway" message is usually caused by a missing kernel module. I know it's in availableKernelModules, but maybe it needs to be force loaded.

I really don't have any other ideas than that.

Mic92 commented 3 weeks ago

@cowboyai Have you tried adding all of this: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/profiles/all-hardware.nix ?

Mic92 commented 3 weeks ago

For debugging do this: Enable systemd in initrd: boot.initrd.systemd.enable = true; and than enable emergency mode: boot.initrd.systemd.emergencyAccess = "$6$he2fblfl/H7I.kvz$WbSCMXu8ztmqfj5jG4czqvu/rkMHxufxqHgy1urzXFSN.jZB4QiW5lOjR08vk8pZTyim3TT1wFkMaNE9zZ3sc1";

Replace this password hash with your own password hash i.e. using mkpasswd. This will give you a emergency console if you press enter, you than have access to sysfs in /sys so it might be easier to check for loadeded kernel modules and what block devices exists. I don't know if we also have lsblk inside this, but potentially there is also a way to add this otherwise to the initrd.