Closed a1j closed 11 months ago
Same happened at me. I realiezed after a clean install HAOS can not migrate to ssd data disk, because on restart it is not mounting.
Same issue here.
The machine boots and will say it's waiting for the HA CLI to start. Upon further inspection it seems like there is expecting a disk partition with the label hassos-data
. However, the partion that should have that label is labeled hassos-data-dis
.
When I proceed to manually re-label the partition, it goes off and does its thing, sort of. After a reboot it still doesn't appear to be happy with my data-disk. I ended up doing a clean install and restoring my most recent backup.
I've hit the same issue. Noticed the same behaviour as @matjahs with the partition label mismatch. Happy to collect logs from my system if that helps get the root of this, just let me know what to grab.
I took this manual workaround:
root
mount /dev/nvme0n1p1 /mnt/data
login
With this i was able to get the system running again, but the update was not installed...
Have you all been using the data disk feature when this happened on upgrade?
The relabel to hassos-data-dis
should only happen when multiple disks are attached with the same label. It seems that this misfired in you cases (initiated by this script).
The fact that this happened even after restart @matjahs seems to indicate that this is reproducible. I am trying to reproduce it here now.
I've installed HAOS 10.5 from scratch, added an NVMe, and then upgraded to 11.0. In my case things worked out fine.
However, I do have a suspicion what the problem is: The script above is triggered by the haos-data-disk-detach.service
service. This service should only be run once on very first boot (as mandated by ConditionFirstBoot=yes
).
Now for some reason, in your cases, the system thought the OS is booting for the first time :thinking:
The first boot is determined by the U-Boot boot loader on startup, if machine id is not being set. I am currently unclear how this could fail in some situation. Ideally I'd need boot logs capture via a serial console. @regan-a is that something you have the tools for?
Also, to all, do yo use an eMMC or SD card device?
Hey @agners, thanks for looking into this! I'm running an SD + NVMe for data. Here is a dump of journalctl after boot. Let me know if you need anything else.
Hello @agners ! I'm also use Odroid M1 with SD Card and NVMe SSD. I've already rollbacked to 10.5 . I couldn't do the data migration after a 11.0 clean install, only worked for me on 10.5 I seen the same mislabeled issue (hassos-data-dis).
Hey @agners, thanks for looking into this! I'm running an SD + NVMe for data. Here is a dump of journalctl after boot. Let me know if you need anything else.
From the logs I can confirm my suspicion: Your boot loader triggers first boot mode.
... systemd.machine_id= fsck.repair=yes systemd.condition-first-boot=true ..
The question is why exactly. Unfortunately the boot loader output isn't available through logs, only through a serial console.
Can you run the following command on the OS shell?
fw_printenv
Also can you check the SHA256 of the boot script:
sha256sum /mnt/boot/boot.scr
If anyone has access to the serial console of that board which is suffering from the problem, the capture of the boot loader phase would be helpful.
My apologies, I misunderstood the request. I've attached the outputs of the two commands, plus a serial dump of the boot.
I'm having this issue too, OS installed on SD Card and data on NVMe.
Everything was working perfectly in 10.5, but as soon as I installed 11.0 it didn't even come back from the mandatory reboot.
Now every time I reboot I have to do e2label /dev/nvme0n1p1 hassos-data
and systemctl start hassos-supervisor
.
So I guess my headless setup just turned into a headache.
Hm, it seems the system did write a new machine id, and at least the running OS is able to read the U-Boot environment :thinking:
This is the correct hash of the boot script in HAOS 11.0, so your boot script doesn't seem corrupted or anything.
This does show the problem really:
** Booting bootflow 'mmc@fe2b0000.bootdev.part_2' with script
loading env...
Card did not respond to voltage select! : -110
## Error: bad CRC, import failed
0 bytes read in 1 ms (0 Bytes/s)
0 bytes read in 1 ms (0 Bytes/s)
Loading standard device tree rk3568-odroid-m1.dtb
116634 bytes read in 13 ms (8.6 MiB/s)
Working FDT set to a100000
Trying to boot slot A, 2 attempts remaining. Loading kernel ...
29901312 bytes read in 1564 ms (18.2 MiB/s)
storing env...
Card did not respond to voltage select! : -110
Starting kernel
Moving Image from 0x2080000 to 0x2200000, end=3f30000
## Flattened Device Tree blob at 0a100000
Booting using the fdt blob at 0xa100000
Working FDT set to a100000
Loading Device Tree to 00000000ede61000, end 00000000edee5fff ... OK
Working FDT set to ede61000
It seems that the U-Boot bootloader is not able to read the environment. :cry:
I am having this exact problem. Upgraded to HAOS 11.0 and it never came back. I had a backup on google drive, so i installed the HAOS 11.0 on SD card, restored the backup and from SD card, everything is fine. I just cannot move the Data disk to the SSD (sata). If i try to move it, it will get stuck on waiting for the CLI to get ready.
Exact same problem too.
For me, the Web UI Upgrade process from 10.5 to 11 did not result in a reboot (tried it several times). However, after resetting the device days later I had the same issues as described here.
On each boot now, the M.2 SSD partition is disabled and booting is only possible when re-labeling it and restarting the docker containers.
However, it seems that I am still on 10.5 according to the web interface.
Sadly same issue also. Switched for now to a VM version in Proxmox and restored a backup. Waiting for a future fix or I will stay on the VM and use the Odroid for other purpose.
same issue here... cannot move data to NVMe.
I ran into the same issue on my Odroid-M1
For me, the Web UI Upgrade process from 10.5 to 11 did not result in a reboot (tried it several times). However, after resetting the device days later I had the same issues as described here.
On each boot now, the M.2 SSD partition is disabled and booting is only possible when re-labeling it and restarting the docker containers.
Same thing here. After each system reboot, I will have a partition labeled hassos-data-old
on my SD card and one labeled hassos-data-dis
on the NVMe SSD. If I then relabel the disabled one to hassos-data
, it continues booting as I would normally expect.
The Settings > About page reports the following:
Home Assistant 2023.10.3 Supervisor 2023.10.0 Operating System 11.0 Frontend 20231005.0 - latest
Could anybody with a build setup provide a boot.scr
with a commented-out first-boot check? So we could at least use the m.2 for data without systemd invoking the rename script, right? Correct, @agners? :-)
Thank you very much.
@cryptoluks that would be a possible work around, I should be able to create such a script.
Are you using an SD card?
Same issue here, I have just not realized any problem after the upgrade until a hard reboot by a power outage. Then it gets stuck on waiting for the CLI to get ready forever. My one boots from SD and data is on an SSD.
@agners I just created a new boot.scr
using mkimage
with your change from https://github.com/home-assistant/operating-system/pull/2856.
I can now observe on the hypervisor:
# journalctl -b | grep -i first
Oct 23 18:48:41 homeassistant systemd[1]: HAOS data disk detach was skipped because of an unmet condition check (ConditionFirstBoot=yes).
Oct 23 18:48:43 homeassistant systemd[1]: First Boot Complete was skipped because of an unmet condition check (ConditionFirstBoot=yes).
# dmesg | grep -i "kernel command line:"
[ 0.000000] Kernel command line: zram.enabled=1 zram.num_devices=3 systemd.machine_id=[snip] fsck.repair=yes root=PARTUUID=[snip] rootfstype=squashfs ro rootwait rauc.slot=A
I did not yet move to the m.2 again for data, but it seems to work now after adding mmc dev ${devnum}
. The machine_id
in the kernel command line was populated and therefore the systemd units were skipped. Awesome!
Here are my steps:
apt-get -y install u-boot-tools
mkimage -T script -A arm64 -C none -n 'Fixed Boot' -d uboot-boot.ush boot.scr
The new binary boot.scr
of uboot-boot.ush has to be placed in /mnt/boot/boot.scr
.
I also created a base64 encoded version from this here for the adventurous. Simply decode it with curl -s https://gist.githubusercontent.com/cryptoluks/82e2b1c3105c85d91e1c225b8938eca0/raw/0fcfeb5afcf14b8c55fc9ede796d321a40bdea00/uboot-odroid-m1-ha11-fixed.scr.base64 | base64 -d > boot.scr
.
# md5sum /mnt/boot/boot.scr
bc45f05698d4fefecbba711da0a71052 /mnt/boot/boot.scr
Update: Works now also with data on the m.2 SSD without any issues.
This will be addressed with HAOS 11.1. You can test it already by updating to 11.1.rc1 on the beta channel.
So, I have managed to replace boot.scr and upgrade to HAOS 11.0 but the sda1 still not auto mounted. Any idea?
So, I have managed to replace boot.scr and upgrade to HAOS 11.0 but the sda1 still not auto mounted. Any idea?
Was your disk renamed before to the disabled label? If yes, you probably have to manually rename it back.
Or try the latest Beta with the fixes, maybe this works better for you.
Edit: Ah, if you replaced it and then upgraded, it think the boot.scr was simply replaced with the bugged one from 11.0.
Edit: Ah, if you replaced it and then upgraded, it think the boot.scr was simply replaced with the bugged one from 11.0.
No, the md5sum looks right for the new boot.scr.
Yes it had a label: hassos-data-dis. Finally, I could solve this issue with: e2label /dev/sda1 hassos-data
Tested the HAOS 11.1.rc1, I can confirm that it fixed the issue and upgraded successfully on ODROID-M1 with SSD on nvme.
Before the upgrade to 11.1.rc1, when I tried to upgrade to 11.0, after each reboot I had to change label
System information
OS Version: Home Assistant OS 10.5
Home Assistant Core: 2023.10.3
~ # ls -l /dev/disk/by-label/
total 0
lrwxrwxrwx 1 root root 15 Oct 26 01:09 hassos-boot -> ../../mmcblk1p2
lrwxrwxrwx 1 root root 15 Oct 26 01:12 hassos-data-dis -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Oct 26 01:09 hassos-data-old -> ../../mmcblk1p9
lrwxrwxrwx 1 root root 15 Oct 26 01:09 hassos-overlay -> ../../mmcblk1p8
~ # e2label /dev/nvme0n1p1 hassos-data
~ # ls -l /dev/disk/by-label/
total 0
lrwxrwxrwx 1 root root 15 Oct 26 01:09 hassos-boot -> ../../mmcblk1p2
lrwxrwxrwx 1 root root 15 Oct 26 01:12 hassos-data -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Oct 26 01:09 hassos-data-old -> ../../mmcblk1p9
lrwxrwxrwx 1 root root 15 Oct 26 01:09 hassos-overlay -> ../../mmcblk1p8
After the upgrade
System information
OS Version: Home Assistant OS 11.1.rc1
Home Assistant Core: 2023.10.3
~ # ls -l /dev/disk/by-label/
total 0
lrwxrwxrwx 1 root root 15 Oct 26 01:29 hassos-boot -> ../../mmcblk1p2
lrwxrwxrwx 1 root root 15 Oct 26 01:29 hassos-data -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Oct 26 01:29 hassos-data-old -> ../../mmcblk1p9
lrwxrwxrwx 1 root root 15 Oct 26 01:29 hassos-overlay -> ../../mmcblk1p8
Incase it helps anyone else, here are the (unoptimized) steps I took to fix my install. I highly suspect there are some unnecessary steps in here but this fixed mine so I can't easily go back and try to shorted them. Normally my setup is headless so I had to hook up a keyboard, monitor, and ethernet:
root
mount /dev/nvme0n1p1 /mnt/data
login
/dev/nvme0n1p1 hassos-data
and systemctl restartstart hassos-supervisor
os update
to install HASSOS 11.1reboot
After the reboot everything worked like a charm.
Just to make sure, does that already mean HassOS can be booted from nvme directly, without an sd card or an emmc module?
The official docs still say it's impossible, but this repo seems to have gotten it working.
Describe the issue you are experiencing
There are multiple reports of this issue, please read here.
I
What operating system image do you use?
odroid-m1 (Hardkernel ODROID-M1)
What version of Home Assistant Operating System is installed?
11.0
Did you upgrade the Operating System.
Yes
Steps to reproduce the issue
If you reboot the system you would have to repeat mount procedure again.
I looked at the journalctl logs and the only error i foudn that "hass filesystem mount by label" job times out and restarts.
BUT this may not mean anything since this is boot of old hass version (upgrade failed at some point).
When ui comes up it asks you to upgdate to 11.0 version again.
It would be nice to know how to fix hass from this half-broken state.
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System information
Odroid M1, SD card, dedicated data disk (ssd). upgrade from 10.5 to 11.0 using Hass Web UI.
Additional information
No response