home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
5.01k stars 981 forks source link

Stability problems since updating to 10 and 10.1 Pi4 8GB NVMe SSD via USB adapter #2536

Closed zillion42 closed 8 months ago

zillion42 commented 1 year ago

Describe the issue you are experiencing

Hi there,

I just want to add to this. I have a: Pi 4b 8GB USB Boot to ORICO SSD Portable External 128GB Mini M.2 NVME

I updated from HA OS 9.5 to 10.0, the day it was released and it has been a nightmare since. I read that some people were not even able to boot when they updated with a similar NVME SSD Pi4 hardware configuration. https://github.com/home-assistant/operating-system/issues/2479

Luckily mine did, it just kept crashing every 5 hours or so. I connected the HDMI and saw that it was the SQUASHFS becoming read only and journald errors. 20230427_204604

compare: https://community.home-assistant.io/t/squashfs-error-ext4-fs-error/293167

I since changed the power supply from a 20W 4 Ampere to a macbook usb-c charger and updated to HA OS 10.1 which, brought some stability improvement. But still it crashed, then about every other day.

Today I rolled back to HA OS 9.5: ha os update --version 9.5 ha core update --version=2023.1.7

and its currently migrating my DB back

Database is about to upgrade from schema version: 41 to: 30

so it's still very busy. It yet remains to be seen whether I get my old regular 1 month or more uptime without crashes. I really hope so.

This is not OKAY!

I suspect it has something to do with the following 'features', from release notes:

  • zswap instead of swap in zram is used. This should allow to use Home Assistant OS on systems with lower amounts of RAM with the trade-off of slightly higher storage wear.

What operating system image do you use?

rpi4-64 (Raspberry Pi 4/400 64-bit OS)

What version of Home Assistant Operating System is installed?

10.1

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

1.Upgrade from 9.5 to 10.0 2.Upgrade from 10.0 to 10.1

Anything in the Supervisor logs that might be useful for us?

can't read relevant logs since I downgraded

Anything in the Host logs that might be useful for us?

can't read relevant logs since I downgraded

System information

System Information

version core-2023.1.7
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.10.7
os_name Linux
os_version 5.15.84-v8
arch aarch64
timezone Europe/Berlin
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.32.1 Stage | running Available Repositories | 1280 Downloaded Repositories | 18
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 9.5 -- | -- update_channel | stable supervisor_version | supervisor-2023.04.1 agent_version | 1.4.1 docker_version | 20.10.22 disk_total | 116.7 GB disk_used | 33.4 GB healthy | true supported | true board | rpi4-64 supervisor_api | ok version_api | ok installed_addons | Samba share (10.0.1), SSH & Web Terminal (13.1.0), Duck DNS (1.15.0), File editor (5.6.0), Mosquitto broker (6.2.1), ESPHome (2023.4.4), SQLite Web (3.7.1)
Dashboards dashboards | 1 -- | -- resources | 11 views | 6 mode | storage
Recorder oldest_recorder_run | 7 November 2022 at 21:22 -- | -- current_recorder_run | 5 May 2023 at 00:06 estimated_db_size | 8282.91 MiB database_engine | sqlite database_version | 3.38.5

Additional information

I downgraded to 9.5 today

I also posted this on the forum: https://community.home-assistant.io/t/home-assistant-os-10-update-has-broken-my-pi-4b-4gb/561918/24

I hope you are aware that many Pi4b users have a very unstable system at the moment.

fromNL commented 1 year ago

I just installed my new RaspBerry 8GB 4+ and wrote an SSD (a cheap brand, LITEON or something like that, data-disk: USB3.0-Super-Speed-DD564198838B0) with Balena Edger. It would not boot, it got stuck at SquashHfs errors.

Then I wrote an SD card and tested: the desktop runs just fine, so the sbc is okay.

After reading all of the above (to the letter) and rewriting the SSD 3 times, trying and trying while I am reading, in the end I came up with my own solution. I pushed the USB plug not in the USB3 (blue) outlet but in the USB2.0 outlet of my RaspBerry. Now it boots! I got passed those SquashHfs problems (they did not show up), and I am now on a prompt ha > and I can even open http://homeassistant.local:8123/

I do not think I need to update more here, I did a temporary registration, and I solved the problem for myself. Note: I did have at hand several other (AT) SSD to USB enclosures/cables lying around for further testing, but I am okay with USB 2.0 (unless it is too slow in the future, after setting up my h.a.).

Note: my H.A. version is 10.4, downloaded and flashed it today.

If changing the USB cable had not worked, I would have gone for the docker solution. I would boot from SD and from there on use docker on the SSD (if that would work, I have not tested it and leave it here for others as an idea).

zillion42 commented 1 year ago

Just to report back,

my system runs stable as well, since I blacklisted UAS in /mnt/cmdline.txt Not my favorite solution, but better than no solution: @fromNL you can use usb3 @HumanSkunk I don't know how to restore your data @Baxxy13 ^yes, that seems the best solution so far. Reliability OVER Speed

EDIT: Does anyone think it would help to update my pi4 firmware?

DaniEll-AT commented 1 year ago

I would like to report my experience:

After i found this issue on GitHub, i checked UAS. With HA OS 9.5 it was using usb-storage, with 10.5 it does so too. Because this is my live system, i am not going to test 10.1 again.

ZdenekM commented 1 year ago

@DaniEll-AT I'm on 10.5 and it still happens... Maybe it's a different issue but the symptoms seem to be the same.

ZdenekM commented 1 year ago

Downgraded to 9.5, and so far, so good...

danir-de commented 1 year ago

Otherwise (not blacklisted) you can blacklist your adapter by yourself by editing /mnt/boot/cmdline.txt and adding your adapter's VendorID:ProductID at the end of the line. (see cmdline.txt)

This also worked for me, I've been running it rock stable for over a month now. Also not my favorite solution, it also seems slower on bigger operations like updating and creating backups but everything else seems to be running just as before.

For everyone still suffering, try this: https://github.com/home-assistant/operating-system/issues/2536#issuecomment-1656806299

If it's not helping, maybe you have another problem?

ZdenekM commented 1 year ago

Anyone tried with 11.x?

TheOriginalMrWolf commented 1 year ago

I am running

Survived several core & OS upgrades, working perfectly so far for around 3 months (touch wood šŸ¤ž - hope I haven't cursed myself by mentioning this...).

markusmauch commented 11 months ago
[    2.016595] usb 2-2: New USB device found, idVendor=0bda, idProduct=9210, bcdDevice=30.00
[    2.016638] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    2.016670] usb 2-2: Product: GV100- 128G
[    2.016697] usb 2-2: Manufacturer: ORICO

Does somewone know if it's still necessary to blacklist the Orico SSD with Home Assistant OS 11.1?

github-actions[bot] commented 8 months ago

There hasn't been any activity on this issue recently. To keep our backlog manageable we have to clean old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant OS version and check if that solves the issue. Let us know if that works for you by adding a comment šŸ‘ This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

zillion42 commented 6 months ago

@agners I'm facing the exact same problem again, only this time it's a pi5 with a pcie ssd hat.
image

It crashes far to often, only I cant blacklist any usb adapter, since its pci express.

20240501_181707

I'm pretty sure I should open a new thread for this, could you point me in the right direction?

# cat /mnt/boot/cmdline.txt

zram.enabled=1 zram.num_devices=3 rootwait cgroup_enable=memory fsck.repair=yes console=tty1 root=PARTUUID=8d3d53e3-6d49-4c38-8349-aff6859e82fd rootfstype=squashfs ro rauc.slot=A systemd.machine_id=3514abb663c6460392093ebcb3449919

# lsusb
Bus 003 Device 001: ID 1d6b:0002
Bus 001 Device 001: ID 1d6b:0002
Bus 004 Device 001: ID 1d6b:0003
Bus 004 Device 002: ID 346d:5678
Bus 002 Device 001: ID 1d6b:0003
# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    1  58.6G  0 disk
`-sda1        8:1    1  58.6G  0 part
zram0       254:0    0     0B  0 disk
zram1       254:1    0    32M  0 disk
zram2       254:2    0    16M  0 disk /tmp
nvme0n1     259:0    0 119.2G  0 disk
|-nvme0n1p1 259:1    0    64M  0 part /mnt/boot
|-nvme0n1p2 259:2    0    24M  0 part
|-nvme0n1p3 259:3    0   256M  0 part /
|-nvme0n1p4 259:4    0    24M  0 part
|-nvme0n1p5 259:5    0   256M  0 part
|-nvme0n1p6 259:6    0     8M  0 part
|-nvme0n1p7 259:7    0    96M  0 part /var/lib/systemd
|                                     /var/lib/bluetooth
|                                     /var/lib/NetworkManager
|                                     /etc/systemd/timesyncd.conf
|                                     /etc/hosts
|                                     /etc/hostname
|                                     /etc/NetworkManager/system-connections
|                                     /root/.ssh
|                                     /root/.docker
|                                     /etc/udev/rules.d
|                                     /etc/modules-load.d
|                                     /etc/modprobe.d
|                                     /etc/dropbear
|                                     /mnt/overlay
`-nvme0n1p8 259:8    0 118.5G  0 part /var/log/journal
                                      /var/lib/docker
                                      /mnt/data
#
zillion42 commented 5 months ago

https://github.com/home-assistant/operating-system/issues/3188