home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
5.01k stars 981 forks source link

Stability problems since updating to 10 and 10.1 Pi4 8GB NVMe SSD via USB adapter #2536

Closed zillion42 closed 8 months ago

zillion42 commented 1 year ago

Describe the issue you are experiencing

Hi there,

I just want to add to this. I have a: Pi 4b 8GB USB Boot to ORICO SSD Portable External 128GB Mini M.2 NVME

I updated from HA OS 9.5 to 10.0, the day it was released and it has been a nightmare since. I read that some people were not even able to boot when they updated with a similar NVME SSD Pi4 hardware configuration. https://github.com/home-assistant/operating-system/issues/2479

Luckily mine did, it just kept crashing every 5 hours or so. I connected the HDMI and saw that it was the SQUASHFS becoming read only and journald errors. 20230427_204604

compare: https://community.home-assistant.io/t/squashfs-error-ext4-fs-error/293167

I since changed the power supply from a 20W 4 Ampere to a macbook usb-c charger and updated to HA OS 10.1 which, brought some stability improvement. But still it crashed, then about every other day.

Today I rolled back to HA OS 9.5: ha os update --version 9.5 ha core update --version=2023.1.7

and its currently migrating my DB back

Database is about to upgrade from schema version: 41 to: 30

so it's still very busy. It yet remains to be seen whether I get my old regular 1 month or more uptime without crashes. I really hope so.

This is not OKAY!

I suspect it has something to do with the following 'features', from release notes:

  • zswap instead of swap in zram is used. This should allow to use Home Assistant OS on systems with lower amounts of RAM with the trade-off of slightly higher storage wear.

What operating system image do you use?

rpi4-64 (Raspberry Pi 4/400 64-bit OS)

What version of Home Assistant Operating System is installed?

10.1

Did you upgrade the Operating System.

Yes

Steps to reproduce the issue

1.Upgrade from 9.5 to 10.0 2.Upgrade from 10.0 to 10.1

Anything in the Supervisor logs that might be useful for us?

can't read relevant logs since I downgraded

Anything in the Host logs that might be useful for us?

can't read relevant logs since I downgraded

System information

System Information

version core-2023.1.7
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.10.7
os_name Linux
os_version 5.15.84-v8
arch aarch64
timezone Europe/Berlin
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.32.1 Stage | running Available Repositories | 1280 Downloaded Repositories | 18
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 9.5 -- | -- update_channel | stable supervisor_version | supervisor-2023.04.1 agent_version | 1.4.1 docker_version | 20.10.22 disk_total | 116.7 GB disk_used | 33.4 GB healthy | true supported | true board | rpi4-64 supervisor_api | ok version_api | ok installed_addons | Samba share (10.0.1), SSH & Web Terminal (13.1.0), Duck DNS (1.15.0), File editor (5.6.0), Mosquitto broker (6.2.1), ESPHome (2023.4.4), SQLite Web (3.7.1)
Dashboards dashboards | 1 -- | -- resources | 11 views | 6 mode | storage
Recorder oldest_recorder_run | 7 November 2022 at 21:22 -- | -- current_recorder_run | 5 May 2023 at 00:06 estimated_db_size | 8282.91 MiB database_engine | sqlite database_version | 3.38.5

Additional information

I downgraded to 9.5 today

I also posted this on the forum: https://community.home-assistant.io/t/home-assistant-os-10-update-has-broken-my-pi-4b-4gb/561918/24

I hope you are aware that many Pi4b users have a very unstable system at the moment.

Stooovie commented 1 year ago

Similar issues, catastrophic issues with OS 10.1 on Rpi4 never encountered before

EDIT: this made me give up on Rpi4, went to HAOS VM on Proxmox, 100% stable. Cheaper, much more robust with builtin NVMe, using the same power.

starryalley commented 1 year ago

Similar issues. I happened to be upgrading from SD card to SSD during this OS 10.0 release and it was a nightmare. I also saw the similar SQUASHFS error. My HA runs smoothly like for 1 years without any issue. It crashes a few times already during the past week. I'm not sure if it is OS 10's issue or is it because of this new SSD hardware. Let's see if I have more information in the coming weeks.

Edit: I'm on OS 10.1 and it still crashes once.

markusmauch commented 1 year ago

Have exactly the same issue and obviously lots of people do. Since 10.0 my HA crashes at least every second day. Before the Update it ran seamlessly on a Raspberry Pi 4b with an external SSD.

zillion42 commented 1 year ago

Hi,

I just wanted to update on this issue. I have had no crashes since I downgraded to HA OS 9.5, uptime is now since 9 May 2023 at 18:17, which was a normal host reboot. I also updated to core 2023.5.2 again, that does not seem to cause any problems.

danir-de commented 1 year ago

I'm having similar issues, it's behaving like a system, starting with HomeAssistant OS 10.0, where the hard-drive got removed while running. Things keep running, but the longer they do, the less functions.

The status page on port 4357 isn't available, the dashboard loads, but everything on it fails to display, error while loading setting pages,...

"Error while loading page hardware"

the app failing to connect locally, automations stop running, Addons crashed, you get the picture.

And since there are no logs kept after a restart, it's impossible to get an idea why the system went into this state.

I'm running the system on a Raspberry Pi 4B+, with an external SSD from the supported RPi SSD-Adapter list and I've even reflashed the OS 2 times already, before restoring my backup - it still keeps happening every 48 hours.

I've attached a capture card, to be able to see what the system prints out, when it crashes again.

mundschenk-at commented 1 year ago

I have experienced the same thing with my Yellow/NVMe combo and symptoms persisted after an in-place downgrade to 9.5 until I did a full reinstall of HAOS from scratch (i.e. wiping the NVMe and reinstalling using USB mass storage mode, as well as uploading a known stable firmware).

I have some serial logs showing that journald can't access its log directory, squashfs errors etc., but they don't show the initial stages of the problem. Once the connection to the drive is lost, there appears to be no way to get a root shell or access kernel logs even with a serial connection. A reboot fixes the issue for some time (a few hours to days), but all logs from the beginning of the fault are wiped.

Since this persisted after the in-place downgrade, I think that the latest firmware maybe to blame, but I don't have anything approaching "proof" for this hypothesis.

Hardware:

danir-de commented 1 year ago

It happened again today, but the screen wasn't on, so I couldn't capture any screenshots etc. Will try to see if I can gather logs any other way.

zillion42 commented 1 year ago

Since we're all speculating, maybe 10.1 just uses more power. I used to have this power supply: 20W4A

I now changed to this power supply: macbookCharger

This seems to support this hypothesis: https://github.com/home-assistant/operating-system/issues/2513 External usb3 NVMe also uses a lot of power. Maybe you guys can give it a shot with a good power supply and report back.

from here https://community.home-assistant.io/t/installing-home-assistant-on-a-rpi-4b-with-ssd-boot/230948 :

The second most important factor for your success is to use a power supply that is capable of driving your Pi 4 and (!) your SSD. Nothing will give you more headaches than an insufficient power supply. You system will stall when you don’t expect it and you will not understand why! At least use the original Pi 4 power plug with 3 Ampere. If you can get a good quality supply with 3.5 Amps or more: Use it! Alternatively you can use a powered USB hub or casing to give your SSD a dedicated power supply. I personally don’t like that idea because a second power plug creates additional energy losses and is another item that can break. But it is still much better than an unstable supply.

EDIT: Still rock solid, no crashes, since I downgraded to HA OS 9.5

Stooovie commented 1 year ago

I did, 3A PS for the Pi 4 PLUS externally powered SSD. Didn't make a difference.

zillion42 commented 1 year ago

@danir-de

I've attached a capture card, to be able to see what the system prints out, when it crashes again.

Would be nice to have rsyslog as an addon maybe. sudo apt install rsyslog

danir-de commented 1 year ago

I'm using a POE adapter, that gives up to 15W in 5V mode (= 5V 3A) and I don't think the upgrade resulted in a higher power draw, especially, since the drive is still signaling some activity even after it crashed.

I haven't researched on how to access the hypervisor directly via SSH without any Add-on, in order to be able to access logs directly, maybe I'll have some time next week to look into it. I've tried to connect via port 22222 before, but the sshd daemon seemed unresponsive, once the system crashes.

markusmauch commented 1 year ago

The first thing I did once the system started to crash was to replace my old 15 W power supply with one that has 20 W. I think this rules out any concerns regarding the power consumption.

I can also confirm that it is no longer possible to SSH into the host (admin via port 22222) once the system becomes unresponsive.

I did, however, notice that recently one of my custom integrations stopped polling. I have a Riemann sum sensor based on one of the integration’s native sensors which then had really strange values. I replaced the custom integration by self-configured REST entities two days ago and since then the system seems to be stable. Not sure though if this is related in any kind…

mundschenk-at commented 1 year ago

I am pretty confident it is not the power supply. After the first incident, I replaced the stock power supply that came with the Yellow with a 36 W battery-buffered Eaton 3S Mini and it still happened again.

agners commented 1 year ago

@mundschenk-at unfortunately several reports of Samsung 980 1TB models not behaving well with Yellow. But since that was also in 9.5, and on Yellow, this is not related to the topic at hand. The symptoms might look similar, but the cause is different: On Yellow the NVMe is directly attached. On RPi 4 an USB to NVMe adapter is being used.

Unfortunately the exact cause for Samsung 980 Pro missbehaving on Yellow is unknown. At this point I can only suggest to use a different NVMe (note also that the Samsung 980 Pro is a bit overkill for Yellow: The CM4/BCM2711 only uses PCIe Gen 2 x1).

See also: https://github.com/home-assistant/operating-system/discussions/2235#discussioncomment-5271165

danir-de commented 1 year ago

This isn't related to that Samsung SSD though, I'm using an Intenso M.2 SSD TOP 128GB as well as this SSD Adapter from SSK.

Also in addition to the POE power supply I'm using, my Pi also is mounted inside a PI-TOP [4], which provides battery power to the Pi 4, so undervoltage or an unreliable power supply shouldn't be any problem here.

The problem is persisting with the latest version 10.2 btw and I haven't been able to extract any useful logs afterwards.. :/

agners commented 1 year ago

@danir-de right, this Thread is not related to Samsung NVMe SSDs or Yellow.

@mundschenk-at your case is really off-topic here. This thread is about issues with NVMe and USB adapters connected to a Raspberry Pi.

agners commented 1 year ago

The problem is persisting with the latest version 10.2 btw and I haven't been able to extract any useful logs afterwards.. :/

This is most likely related to Raspberry Pi's Linux kernel and/or firmware. There hasn't been an update to them since a while, so this is kinda expected.

Are you using USB boot? Can you try to use SD-card boot along with the data disk feature to see if that works better?

markusmauch commented 1 year ago

After the latest 'incident' I accessed the OS via SSH (admin / port 22222) and did some research in the journalcrl. I found that there are no entries for almost two hours until the time noticed that the system got unresponsive and restarted the hardware:

Jun 06 06:07:52 homeassistant addon_77113f40_powerbox-mqtt[567]: END Reading value of ... Jun 06 06:07:52 homeassistant addon_45df7312_zigbee2mqtt[567]: Zigbee2MQTT:info 2023-06-06 08:07:52: MQTT ... Jun 06 06:07:52 homeassistant addon_45df7312_zigbee2mqtt[567]: Zigbee2MQTT:info 2023-06-06 08:07:52: MQTT ... -- Boot 53ee02a2a235499faf6b7779d455b507 -- Jun 06 08:03:16 homeassistant systemd[1]: Starting HassOS AppArmor...

There is also nothing before that event in the logs that would look suspicious. The system just suddenly hangs and does not even write logs anymore.

I updated to 10.2 a few days ago and this was the first crash since the update.

I'm using this external SSD: https://www.amazon.de/dp/B085TL8W6V?psc=1&ref=ppx_yo2ov_dt_b_product_details

zillion42 commented 1 year ago

After the latest 'incident' I accessed the OS via SSH

I can't SSH into my pi anymore after it hangs. Sending logs to a remote host with Rsyslog would really be helpful.

markusmauch commented 1 year ago

I can't SSH into my pi anymore after it hangs. Sending logs to a remote host with Rsyslog would really be helpful.

Me neither but after a reboot you can still see the old logs using journalctl.

zillion42 commented 1 year ago

I know this is not very helpful for solving the problem, but downgrading to 9.5 really helps, just in case you need HA to be running stable again. image

ha os update --version 9.5

mundschenk-at commented 1 year ago

@mundschenk-at your case is really off-topic here. This thread is about issues with NVMe and USB adapters connected to a Raspberry Pi.

That was not clear before. I'll create a separate ticket.

mundschenk-at commented 1 year ago

I can't SSH into my pi anymore after it hangs. Sending logs to a remote host with Rsyslog would really be helpful.

Me neither but after a reboot you can still see the old logs using journalctl.

That unfortunately does not help much when the disk with those logs becoming offline is the issue at hand.

HumanSkunk commented 1 year ago

Same issue here when using 10.1. I have a usb adapter to a 2.5 SSD (not an NVMe). I went back to 9.5 a month or so ago and it’s been solid. I came to check the bug reports to see if others had the same issue and it seems so. This reminds me of an issue back in late 2020 early 2021 where it came down to an issue with the Pi firmware after months of people debugging differences.

markusmauch commented 1 year ago

Okay, went back to 9.5 a week ago and it's running stable ever since. Do you see a problem staying there for a longer period of time? Doesn't look like there will be a fix in the near future?

danir-de commented 1 year ago

This still is a problem with the latest 10.3 release. I've upgraded back to the latest version, after rolling back for a few weeks, since I want to use some of the new features. I'm trying a workaround by restarting the host twice a day via an automation.

zillion42 commented 1 year ago

This is most likely related to Raspberry Pi's Linux kernel and/or firmware. There hasn't been an update to them since a while, so this is kinda expected.

Are you using USB boot? Can you try to use SD-card boot along with the data disk feature to see if that works better?

@agners ^This is really sad, Raspberry Pi USB SSD support is still broken. My system is a sinking ship, are you guys planning to fix this in the foreseeable future ? Meanwhile I'm starting to migrate my installation back to docker container on my qnap. Thx in advance for the effort.

Baxxy13 commented 1 year ago

I am using... 174c:55aa ASMedia Technology Inc. ASM1051E SATA 6Gb/s bridge, ASM1053E SATA 6Gb/s bridge, ASM1153 SATA 3Gb/s bridge, ASM1153E SATA 6Gb/s bridge ... Adapter on the USB3-Port of my Pi4B without problems.

The Chipset ist blacklistet for UAS-Mode via cmdlint.txt by HA-OS

Another older Adapter (not UAS capable, not Blacklisted): ID 174c:5136 ASMedia Technology Inc. ASM1053 SATA 3Gb/s bridge works also without problems.

zillion42 commented 1 year ago

@Baxxy13 From what I can gather I am using 0bda:9210 Realtek Semiconductor Corp. RTL9210 M.2 NVME Adapter inside my ORICO GV100 SSD Portable External 128GB Mini M.2 NVME. I did not blacklist any usb to scsi functionality, but that's all irrelevant here. The issue at hand is, that it works totally reliable and stable with HA OS 9.5, but keeps crashing once I update to HA OS 10.0+

markusmauch commented 1 year ago

@Baxxy13 From what I can gather I am using 0bda:9210 Realtek Semiconductor Corp. RTL9210 M.2 NVME Adapter inside my ORICO GV100 SSD Portable External 128GB Mini M.2 NVME. I did not blacklist any usb to scsi functionality, but that's all irrelevant here. The issue at hand is, that it works totally reliable and stable with HA OS 9.5, but keeps crashing once I update to HA OS 10.0+

I have the same hardware and exactly the same problem. I bought the SSD one month before 10.0 came out and now I'm stuck with 9.5 forever? This really sucks even more since I'm paying for the Nabu Casa subscription...

Baxxy13 commented 1 year ago

HA-OS 10.x uses Linux-Kernel 6.1, HA-OS 9.5 uses Linux-Kernel 5.15. Maybe there are issues in the 6.1 Kernel with your RTL9210 Adapter.

As far as i know the most problems comes from the uas-mode of the usb-driver. Therefore HA-OS has ab "Blacklist" which contains a list of reported problematic adapters in uas-mode.

Function is simple: If the adaper is identified via VendorID:ProductID and blacklisted, then the usb-driver uses usb-storage instead of uas.

Lower "performance" of course, but better than crashing.

What i would do: Check which mode the adapter uses: dmesg | grep usb should show...

[  1.926422] usb X-X: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[  1.926448] usb X-X: Product: Hersteller Storage Device
[  1.926472] usb X-X: Manufacturer: Hersteller
[  1.926494] usb X-X: SerialNumber: XXXXXXXXXXX
[  1.929365] usb X-X: UAS is ignored for this device, using usb-storage instead
[  1.929495] usb X-X: UAS is ignored for this device, using usb-storage instead

... if your adapter is blacklisted.

Is that so, then i have no more ideas.

Otherwise (not blacklisted) you can blacklist your adapter by yourself by editing /mnt/boot/cmdline.txt and adding your adapter's VendorID:ProductID at the end of the line. (see cmdline.txt) Then save and (re)boot your system.

If that works (i hope so), then the adapter could/should be blacklisted by HA-OS-Team too.

zillion42 commented 1 year ago

@Baxxy13

[    2.016595] usb 2-2: New USB device found, idVendor=0bda, idProduct=9210, bcdDevice=30.00
[    2.016638] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    2.016670] usb 2-2: Product: GV100- 128G
[    2.016697] usb 2-2: Manufacturer: ORICO
[    2.016724] usb 2-2: SerialNumber: 012345678936
[    2.020055] usb 2-2: Enable of device-initiated U1 failed.
[    2.020983] usb 2-2: Enable of device-initiated U2 failed.

I was searching for the /mnt/boot/cmdline.txt earlier.

➜  ~ ls -la /mnt
total 8
drwxr-xr-x    2 root     root          4096 Feb 10 17:48 .
drwxr-xr-x    1 root     root          4096 Jul 27 03:12 ..

I think I need to mount /dev/sda1 to access it, but that only works by accessing the machine via admin ssh on port 22222, maybe @markusmauch can help me with that? So far I always use the ssh addon or putty from windows on regular port 22

➜  ~ mount -t vfat /dev/sda1 /mnt
mount: permission denied (are you root?)
➜  ~ df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                 116.7G     46.2G     65.7G  41% /
/dev/sda8               116.7G     46.2G     65.7G  41% /addons
/dev/sda8               116.7G     46.2G     65.7G  41% /ssl
/dev/sda8               116.7G     46.2G     65.7G  41% /media
/dev/sda8               116.7G     46.2G     65.7G  41% /backup
/dev/sda8               116.7G     46.2G     65.7G  41% /data
/dev/sda8               116.7G     46.2G     65.7G  41% /config
/dev/sda8               116.7G     46.2G     65.7G  41% /share
devtmpfs                  3.7G         0      3.7G   0% /dev
tmpfs                     3.8G         0      3.8G   0% /dev/shm
/dev/sda8               116.7G     46.2G     65.7G  41% /etc/asound.conf
/dev/sda8               116.7G     46.2G     65.7G  41% /run/audio
tmpfs                     1.5G      1.3M      1.5G   0% /run/dbus
/dev/sda8               116.7G     46.2G     65.7G  41% /etc/resolv.conf
/dev/sda8               116.7G     46.2G     65.7G  41% /etc/hostname
/dev/sda8               116.7G     46.2G     65.7G  41% /etc/hosts
tmpfs                     3.8G         0      3.8G   0% /dev/shm
/dev/sda8               116.7G     46.2G     65.7G  41% /var/log/journal
tmpfs                     1.5G      1.3M      1.5G   0% /run/log/journal
/dev/sda8               116.7G     46.2G     65.7G  41% /etc/pulse/client.conf
tmpfs                     3.8G         0      3.8G   0% /proc/asound
devtmpfs                  3.7G         0      3.7G   0% /proc/keys
devtmpfs                  3.7G         0      3.7G   0% /proc/latency_stats
devtmpfs                  3.7G         0      3.7G   0% /proc/timer_list
tmpfs                     3.8G         0      3.8G   0% /sys/firmware

EDIT: And even if all that works... Why would I buy a usb3 to NVME adapter, to then blacklist the uas functionality and use my SSD as a USB-Storage filesystem.

Might as well use a sdcard. Not very satisfying.

EDIT2: Still if it works, you are right: "If the adaper is identified via VendorID:ProductID and blacklisted, then the usb-driver uses usb-storage instead of uas. Lower "performance" of course, but better than crashing."

EDIT3: I already installed debian 12 bookworm supervised HA on my wifes old macbook 4,1. Intel Core Duo 2.4 GHz, 2.5" Hitachi/HGST Travelstar Z5K320 I really think I'm going there, and finally say bye bye to my PI4,

markusmauch commented 1 year ago

I think I need to mount /dev/sda1 to access it, but that only works by accessing the machine via admin ssh on port 22222, maybe @markusmauch can help me with that? So far I always use the ssh addon or putty from windows on regular port 22

I'm glad to help but I think I need a bit more detailled instructions ... dmesg | grep usb seems not to show the SSD at all:

[4075069.211423] usb 1-1.3: new full-speed USB device number 8 using xhci_hcd
[4075069.324907] usb 1-1.3: New USB device found, idVendor=10c4, idProduct=ea60, bcdDevice= 1.00
[4075069.324930] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[4075069.324936] usb 1-1.3: Product: CP2102 USB to UART Bridge Controller
[4075069.324942] usb 1-1.3: Manufacturer: Silicon Labs
[4075069.324946] usb 1-1.3: SerialNumber: 0033
[4075069.366451] usbcore: registered new interface driver cp210x
[4075069.366585] usbserial: USB Serial support registered for cp210x
[4075069.372884] usb 1-1.3: cp210x converter now attached to ttyUSB0
[4245991.878035] cp210x ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
[4245991.878641] cp210x ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
[4245991.935958] usb 1-1.3: USB disconnect, device number 8
[4246709.066412] usb 1-1.3: new full-speed USB device number 9 using xhci_hcd
[4246709.171893] usb 1-1.3: New USB device found, idVendor=10c4, idProduct=ea60, bcdDevice= 1.00
[4246709.171924] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[4246709.171937] usb 1-1.3: Product: CP2102 USB to UART Bridge Controller
[4246709.171949] usb 1-1.3: Manufacturer: Silicon Labs
[4246709.171960] usb 1-1.3: SerialNumber: 0033
[4246709.185336] usb 1-1.3: cp210x converter now attached to ttyUSB0
[4246722.404795] cp210x ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
[4246722.405267] cp210x ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
[4246722.412389] usb 1-1.3: USB disconnect, device number 9
[4246724.170612] usb 1-1.3: new full-speed USB device number 10 using xhci_hcd
[4246724.276121] usb 1-1.3: New USB device found, idVendor=10c4, idProduct=ea60, bcdDevice= 1.00
[4246724.276153] usb 1-1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[4246724.276166] usb 1-1.3: Product: CP2102 USB to UART Bridge Controller
[4246724.276178] usb 1-1.3: Manufacturer: Silicon Labs
[4246724.276189] usb 1-1.3: SerialNumber: 0033
[4246724.287794] usb 1-1.3: cp210x converter now attached to ttyUSB0

Im logged into the system using the admin SSH via port 2222 and the system is currently running 9.5 (stable).

I think the SSD is /dev/sda8 on my system since thats the only device that would be large enough:

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0 119.2G  0 disk
|-sda1   8:1    0    32M  0 part /mnt/boot
|-sda2   8:2    0    24M  0 part
|-sda3   8:3    0   256M  0 part /
|-sda4   8:4    0    24M  0 part
|-sda5   8:5    0   256M  0 part
|-sda6   8:6    0     8M  0 part
|-sda7   8:7    0    96M  0 part /var/lib/NetworkManager
|                                /etc/systemd/timesyncd.conf
|                                /etc/hosts
|                                /etc/hostname
|                                /etc/NetworkManager/system-connections
|                                /var/lib/systemd
|                                /var/lib/bluetooth
|                                /root/.ssh
|                                /root/.docker
|                                /etc/udev/rules.d
|                                /etc/modules-load.d
|                                /etc/modprobe.d
|                                /etc/dropbear
|                                /mnt/overlay
`-sda8   8:8    0 118.6G  0 part /var/log/journal
                                 /var/lib/docker
                                 /mnt/data
zram0  254:0    0   1.9G  0 disk [SWAP]
zram1  254:1    0    32M  0 disk /var
zram2  254:2    0    16M  0 disk /tmp
zillion42 commented 1 year ago

Hi,

my HA OS updated itself to 10.1 today, I didn't do anything, how is that even possible? Of course it crashed my SSD again and lights were not working, in the whole house, all of them... I'm really {{seriously, annoyingly, angirly}} getting tired of this...

@markusmauch @Baxxy13 (@frenck (I one time sponsored you the day before yesterday))

markusmauch commented 1 year ago
  • I thought maybe one of you might be able to give me a rundown how to enable ssh access on 22222

Just follow the procedure described here:

https://developers.home-assistant.io/docs/operating-system/debugging/

It looks worse than it is and is very useful in many occasions...

zillion42 commented 1 year ago

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXsruscncaScl6oRh92F0Htrq+1urxUo2ySoHV90zvqu7ZJekgLrKnAVh1A8WupXhOxSIApllUERGtMNfW+F9tfXnsqkwK93AuQ4n6bkb4fWf1hiSfM1jfvGLlwxjmrYcttJSGUxBLPEgmsGGkmmcy6+2S73iuqKecAVuYwZn5aSSaOvJHuAkKBdcyXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXyWQdHDZDCP0qWcsS6ec9HcVyF+sMwTqItqHj8K8rZkTefxwGg5A+OlW3n4tcmteq0bjSqSQrRCNqg3N+xu1JPVW9LOHWHKujVP2Ttyr7+xOwb3V7AlOq7tc155qY2TkSOrP2eLBApJz/zVcr/aCO5aYXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

zillion42 commented 1 year ago
tobi@introvision2:~$ sudo lsblk
[sudo] password for tobi:
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
...
(blah)
...
sda           8:0    0 119,2G  0 disk
├─sda1        8:1    0    32M  0 part
├─sda2        8:2    0    24M  0 part /media/tobi/disk
├─sda3        8:3    0   256M  0 part /media/tobi/disk3
├─sda4        8:4    0    24M  0 part /media/tobi/disk1
├─sda5        8:5    0   256M  0 part /media/tobi/disk2
├─sda6        8:6    0     8M  0 part
├─sda7        8:7    0    96M  0 part /media/tobi/hassos-overlay
└─sda8        8:8    0 118,6G  0 part /media/tobi/hassos-data
nvme0n1     259:0    0 931,5G  0 disk
├─nvme0n1p1 259:1    0   100M  0 part /boot/efi
├─nvme0n1p2 259:2    0    16M  0 part
├─nvme0n1p3 259:3    0   465G  0 part /media/win10
├─nvme0n1p4 259:4    0   614M  0 part
└─nvme0n1p5 259:5    0 465,8G  0 part /

tobi@introvision2:~$ sudo mount -t vfat /dev/sda1 /mnt

tobi@introvision2:~$ ls -l /mnt/
total 6652
-rwxr-xr-x 1 root root   52656 Aug  1 19:29 bcm2711-rpi-400.dtb
-rwxr-xr-x 1 root root   52524 Aug  1 19:29 bcm2711-rpi-4-b.dtb
-rwxr-xr-x 1 root root   53265 Aug  1 19:29 bcm2711-rpi-cm4.dtb
-rwxr-xr-x 1 root root    2411 Aug  1 19:29 boot.scr
-rwxr-xr-x 1 root root     137 Aug  1 19:29 cmdline.txt
-rwxr-xr-x 1 root root    2160 Aug  1 19:29 config.txt
-rwxr-xr-x 1 root root    3170 Aug  1 19:29 fixup4cd.dat
-rwxr-xr-x 1 root root    5398 Aug  1 19:29 fixup4.dat
-rwxr-xr-x 1 root root    8386 Aug  1 19:29 fixup4x.dat
drwxr-xr-x 2 root root   24576 Aug  1 19:29 overlays
-rwxr-xr-x 1 root root  805436 Aug  1 19:29 start4cd.elf
-rwxr-xr-x 1 root root 2250848 Aug  1 19:29 start4.elf
-rwxr-xr-x 1 root root 2998344 Aug  1 19:29 start4x.elf
-rwxr-xr-x 1 root root  533432 Aug  1 19:29 u-boot.bin

tobi@introvision2:~$ sudo vim /mnt/cmdline.txt
dwc_otg.lpm_enable=0 console=tty1 usb-storage.quirks=174c:55aa:u,2109:0715:u,152d:0578:u,152d:0579:u,152d:1561:u,174c:0829:u,14b0:0206:u,0bda:9210:u
:x

tobi@introvision2:~$ sudo umount /dev/sda1
zillion42 commented 1 year ago

image

Success for blacklisting UAS.

I will now update to 10.3 and report stability.

zillion42 commented 1 year ago

I'm going to stay, optimistic here. If I come home, back from work, tomorrow evening, and all lights wont work, again, because my SSD became unresponsive again...

How hard can it be to blacklist: 0bda:9210 Realtek Semiconductor Corp. RTL9210 M.2 NVME Adapter for the @HA_dev_Team? I've been struggling since May 5th.

EDIT: Pi4 SSD USB Boot support is STILL broken with kernel 6.1 EDIT2: THX @Baxxy13

zillion42 commented 1 year ago

@markusmauch if you really have the same hardware, try: lsusb That should say 0bda:9210 somewhere

Add: ,0bda:9210:u to the end of your /mnt/boot/cmdline.txt. Add the comma, add the :u

markusmauch commented 1 year ago

if you really have the same hardware, try: lsusb That should say 0bda:9210 somewhere

I can confirm this. I changed the /mnt/boot/cmdline.txt accordingly and updated to 10.3. I'll keep you informed...

Baxxy13 commented 1 year ago

because my SSD became unresponsive again...

Sad to hear that. I really hoped blacklisting the uas-mode works. But ist seems that the active uas-mode is not the problem.

Searching for...

[    2.020055] usb 2-2: Enable of device-initiated U1 failed.
[    2.020983] usb 2-2: Enable of device-initiated U2 failed.

leads to articles about problematic U1/U2 implementations which had to do with LPM (low power mechanism) of USB3-Devices. e.g. here

Disabling LPM for your USB-Device might be worth a try. What i have read, this could also be done via quirks in cmdline.txt See here in the answer But i never used or tested this and i don't know if it's supported by HA-OS.

zillion42 commented 1 year ago

@Baxxy13, I think you got me wrong. I said, "if"

I'm going to stay, optimistic here. If I come home, back from work, tomorrow evening, and all lights wont work, again, because my SSD became unresponsive again...

All is peachy so far: image

SamuelCarson commented 1 year ago

this is really frustrating.. I can't use network storage while on 9.5, I need to be on 10.0+. No one is assigned, I anyone from the development team even looking at this?

nheuermann commented 1 year ago

I had the same problem, wasn't even able to startup HA on Raspi4, first it keeps checking the file system and then SquashFS errors flood the logs. After taking my Kingston SSD from the ICY BOX IB-AC703-U3 USB adapter and putting it into my older HDD/SSD adapter (one to put them in vertically, Sharkoon QuickPort USB 3.1 Type C) everything worked fine immediately. Latest HA version. I guess it's a (Linux Kernel?) incompatibility with certain SSD USB bridges.

zillion42 commented 1 year ago

I unplugged my ssd and ran my old SD card with Raspian for deconz to check my current firmware Version. It said this:

pi@phoscon:~ $ sudo vcgencmd version
Dec  1 2021 15:01:54
Copyright (c) 2012 Broadcom
version 71bd3109023a0c8575585ba87cbb374d2eeb038f (clean) (release) (start)

Maybe I should update the firmware ?

pi@phoscon:~ $ sudo rpi-eeprom-update
*** UPDATE AVAILABLE ***
BOOTLOADER: update available
   CURRENT: Tue 26 Apr 2022 10:24:28 AM UTC (1650968668)
    LATEST: Wed 11 Jan 2023 05:40:52 PM UTC (1673458852)
   RELEASE: default (/lib/firmware/raspberrypi/bootloader/default)
            Use raspi-config to change the release.

  VL805_FW: Using bootloader EEPROM
     VL805: up to date
   CURRENT: 000138a1
    LATEST: 000138a1

If so:

Or might I run into similar problems as this guy? https://community.home-assistant.io/t/raspberry-firmware/466510

Thx for letting me know, I will now boot back into my SSD and await your reply

P.S. Might this eventually help with SSD issues for kernel 6.1 on raspberry pi ?

markusmauch commented 1 year ago

Add ,0bda:9210:u to the end of your /mnt/boot/cmdline.txt.

This did do the trick. My system ran stable on 10.3 for a whole week. I updated to 10.4 yestarday and so far it looks good.

Thanks for the assistance!

agners commented 1 year ago

this is really frustrating..

Tell me about it, it really is! Just search this bug tracker for issues with the label usb-ssd. Just crazy, how many issues piled up until today.

We've been tempted to declare USB SSD unsupported entirely. But on the other hand, there are configurations of USB SSD adapters + disk + Raspberry Pis which do work really reliable. It just seems a huuuge hit and miss. :cry:

I can't use network storage while on 9.5, I need to be on 10.0+. No one is assigned, I anyone from the development team even looking at this?

I follow it loosly.

In the end, Raspberry Pi and USB SSD has a painful journey since.. forever, essentially. It was one reason we created Yellow: Proper NVMe SSD using M.2/PCIe did help to alleviate a lot of the problems. This is the technology PCs and Notebooks are using, and sidesteps all the USB powering and USB UMS protocol issues. Granted it seems that the Raspberry Pi SoC also has troubles to talk to some high end NVMe's such as Samsung 980 Pro and WD_BLACK NVMe SSD, unfortunately :cry: , but pretty much every other NVMe really works rock solid.

That said, I'd really hope that USB SSD support gets more stable as time progresses. But we rely on the progress of the Raspberry Pi kernel. I am waiting for a new Raspberry Pi Linux kernel release, but it seems they stopped releasing regularly, their last release was in April, see https://github.com/raspberrypi/linux/tags :weary:

Baxxy13 commented 1 year ago

Just thinking... Most Pi4B-USB-Boot problems comes from the uas-mode of the usb driver. Blacklisting uas-mode for known problematic adapters via /mnt/boot/cmdline.txt works. But the "Blacklist" isn't frequently updated and even if so, an HA-OS Update doesn't update the /mnt/boot/cmdline.txt as far as i know. So only fresh installations of HA-OS would benefit from an actual "Blacklist".

Why not disabling usb-uas mode totally for the Pi4B HA-OS? I never heard of problems with USB-Boot if the usb-driver uses usb-storage instead of uas. (other issues like insufficent power adapters, bad cables etc. not included)

Ok, performance throughput is a bit lower with usb-storage but this isn't really appreciable within the running HA-OS. As said earlier, better slightly lower performance and stable, than crashing and unstable system.

Sidenote from my last adapter-testing: If the same adapter (with the same ssd) is running in usb-storage mode (instead of uas) the whole system consumes slightly lesser power.

HumanSkunk commented 1 year ago

For the sake of reliability, stability and my sanity is there an easy migration path to the other supported solution with SSDs with having HAOS installed on the SD card and HA on the SSD? Would it be a case of creating a backup in my current configuration of it all on an SSD, installing a new instance direct on an SD card, restore the backup to the SD card and then migrate the data disk back to the SSD? Or would it be better to migrate the data disk and then restore the backup? The back up is universally supported as long as its a supervised instance of HA right regardless of what hardware and configuration i stick it on?