Closed bschatzow closed 2 years ago
Last time I checked 32-bit worked on the devices I test with usually. With OS release 6 we get a new U-Boot, and there have >been fixes which might improve the situation. Just tested a local build with U-Boot 2021.04-rc4 now with my Samsung T1 USB >3.0 SSD, and 32-bit seems to work well. I'll let you know once I have some builds online. I will test this as soon as you tell me it is available. I believe the memory handling or timing of the 64 bit vs 32bit maybe the issue. If you look at the people where they are having no issues, most are using the 8GB RPI4 which is at PCB 1.4 vs 1.2 on the 4GB. I have been in contract with some with the identical hardware as me (same StarTech controller) and they have no issue. I have left messages in the Raspberry forums to try and get clarification. Will pass on any useful information I receive back to you. I did test the 64 bit SD /SSD setup and this failed as well.
Just Googled quickly, have you noticed this thread? https://www.raspberrypi.org/forums/viewtopic.php?t=297562
I saw this previously, and I have updated to the new firmware and it made no difference. I also have the older 3.0 adapter and it also fails the same way and supposedly there is no issue. If you feel it is the controller, what controller should I use?
To me it seems that this adapter is unstable in general, so this seems not really be HAOS related. To me if it is not HAOS related why does it only fail on OS above 5.4?
I have disabled the Home Assistant Google Drive Backup addon and haven't had a crash in over a week. This is definitely a new record.
Not sure how this would be connected but maybe someone else can give it a try.
@tobias-kuendig,
I would think that this would be a coincidence. I check this sheet that is linked on issue 1119
https://docs.google.com/spreadsheets/d/1iHTVvaNlTUqwFUgsUhUNws2Sw115INIx5ChEgTnIfoc/edit#gid=0
and many of the people having issues do not have this add-on. You may want to look at this sheet and add your configuration to it.
I do use google drive add on and it is set to run daily at 3AM. I don't see how it could crash the system when not running? I will test it Monday or Tuesday and get back to you.
@agners @pvizeli
I am trying to understand why most people using the 8G PI4 boards with the same hardware I have are not seeing any issues. I have asked in the raspberry forums
https://www.raspberrypi.org/forums/viewtopic.php?f=63&t=307132&e=1&view=unread#unread
And one of the responses seems to make sense
"even if your only using 30% of the 4gig, linux may randomly use all of the 8gig, in any order, and may run into compatibility problems with 64bit addressing"
Since I can't get the 32 bit version to boot the SSD Is there anyway to test this idea?
@agners I have not heard back from you on the 32bit OS that can boot off of the SSD. I tried the spit SD / SSD with the 64 bit OS and it failed. I am going to try today the 32bit split SD / SSD and let you know if this works. Can't do this as import with USB is broke so I can't run SSH 22222.
Just to report back on my Home Assistant Google Drive Backup
tests: I did disable the plugin and had no issues for over two weeks. I did re-enable the addon last Monday and had my first crash a little more than 30 hours later.
I do use google drive add on and it is set to run daily at 3AM. I don't see how it could crash the system when not running?
I run the same config, and the crash does not happen at the time of the backup or after a specific interval. It seems to happen randomly and it does not happen every time a backup is made.
But I wonder if the backup job might overwhelm the Pi in some way, which causes this issue later on?
After all, my snapshots are around 800 MB in size with a 4 GB home-assistant.db
to compress. But it's true, I don't have any idea what the exact issue would be, especially as it manifests at random times, so the Backup plugin as a culprint sounds wrong.
I looked at the 1119 issue again with the spreadsheet mentioned
https://docs.google.com/spreadsheets/d/1iHTVvaNlTUqwFUgsUhUNws2Sw115INIx5ChEgTnIfoc/edit#gid=0
And many do not use the google drive add on and have the same issue. It would be interesting to see why it works for you? Maybe some feedback from @agners can give us some understanding on this?
Since the upgrade to core-2021.3.4 I never experienced a hung state again :-) Before that daily hung states.
Edit: Crashed again :-(
@lwolfs Had no effect on me.
@bschatzow
I have not heard back from you on the 32bit OS that can boot off of the SSD. I tried the spit SD / SSD with the 64 bit OS and it failed. I am going to try today the 32bit split SD / SSD and let you know if this works. Can't do this as import with USB is broke so I can't run SSH 22222.
I had it booting off a USB 3.0 flash stick (I think I used a Kingston model I have here on my table) as well as my Samsung T1 250GB SSD. Both worked fine with 32-bit OS using USB 3.0. But as we know, this doens't mean it works for your combination of controllers/disk etc. in the world of USB boot :cry:
@bschatzow @tobias-kuendig
But I wonder if the backup job might overwhelm the Pi in some way, which causes this issue later on?
Quite possible. If memory gets scarce, all kind of things can happen, and a system hang is a very likely outcome.
I think quite some of the freeze issues are just related to memory. It can be that the new OS release has a slightly different memory profile (read, needs slightly more memory when using xy), which causes problems in different setups.
Not seeing any memory issue in any of the Home Assistant sensors I set up. Also I see someone used the identical system that was freezing with HA OS get fixed by using the Debian install and Home Assistant on top. I still would like to try the 32 bit SSD version. None of the releases have worked for me.
I am in now trying the split again. This time using the 32 bit version. I had two issues with the restore ( the mariaDB corrupted, not sure how as this is stopped prior, and my wyze sense not working) . Both are fixed and I am up since 5:45 EST on 4/1. I'll let you know if this goes down and when. Let me know if it does if I should capture new logs?
@agners I used the echo 7 > /proc/sys/kernel/printk to capture the logs on the screen and this is nothing like before. Before it was a constant scroll where everything showed. Anyway to get that back?
@agners @pvizeli Just did the split system again. This time I used the 32 bit OS. System worked from 5:45 AM until it crashed at 12:58 PM EST. No memory / power / Heat/ CPU, etc issues showed in the sensors. OS 5.4 continues to work with no issue. Any ideas on what to try next? Debian?
@agners Just tried 5.13. Failed in less than 10 hours. Do you want any logs? Also is an option to turn the logs on the monitor back on?
Added to the spreadsheet. Is anyone investigating this? Do you need help reproducing? I'll mail you my SSD/adapter if that helps!
@bschatzow as I wrote above, kernel logs are active if they are of significant importance. What is disabled should not really matter. If you like the verbose output still, you can enable it manually: https://github.com/home-assistant/operating-system/issues/1256#issuecomment-788907136
@jherby2k I am currently not actively investigating as USB SSD boot support discouraged.
@agners, @jherby2k I have the exact same issue with the freeze using the SD boot and everything else on SSD. I provided logs and did not see anything different than the straight SSD boot? I was told this method is supported? I have been seeing people starting to switch to either the Debian 64 bit or the PI OS 64 bit and using Home Assistant Supervised. None have reported any system freeze. I tried enabling the verbose output and it does not work. The 5.4 shows everything. The 5.10 and above does not. If my system worked correctly I would not need any log or the monitor attached for that matter. I am enabling it based on your comments to me last year asking me if there is any useful information on the monitor?
So i tried switching to the Debian boot today. Surprisingly, the same issue occurred - things slowed down to a crawl shortly after installing HA in docker, and I had the same weird side effect of being unable to configure Wifi, at least intermittently. This happened on hassos as well. Strangely the OS itself seemed usable. No kernel errors, and even a performance benchmark ran quick and smooth while HA was all hung up.
This led to more fiddling (easier on Debian) and i've tentatively concluded that at least for me, this is an issue with my StarTech adapter needing quirks mode to disable UASP (added usb-storage.quirks=174c:55aa:u to cmdline.txt as per https://www.raspberrypi.org/forums/viewtopic.php?t=245931). A few hours in and everything is working well - i'll update if it crashes again.
@jherby2k yeah that is a common issue with UAS. But usually that should lead to uas_eh_abort_handler
and similar error on the console.
@bschatzow the continous logs might help sometimes to see if the system actually freezed or if its still half way alive, so selectively enabling might make sense. But often its just confusing.
In your tests, do you restore your setup (e.g. addons etc) or did you also try running a fresh & empty installation?
There are definitely installations out there which run stable with latest HAOS and RPi4 on a USB SSD, so it must be some software and/or hardware configuration. On the development builds for HA 6.0 I updated to the lastest Raspberry Pi Kernel and Firmware, maybe this helps. A new development build should be ready tomorrow. I have a build with that kernel/firmware now running on my Raspberry Pi 4 (4GB) using a Samsung T1 250GB connected via USB 3.0 running for more than 8 hours stable so far. I'll let it run for a while, to see if I can reproduce something.
@jherby2k yeah that is a common issue with UAS. But usually that should lead to
uas_eh_abort_handler
and similar error on the console.
I'm running headless - can I find those errors in a log somewhere? So far its really running much smoother. Strange, because from what I can gather the startech adapter should be solid, and I don't believe the SSD is a factor in UAS at all, right?
@jherby2k yeah that is a common issue with UAS. But usually that should lead to
uas_eh_abort_handler
and similar error on the console.@bschatzow the continous logs might help sometimes to see if the system actually freezed or if its still half way alive, so selectively enabling might make sense. But often its just confusing.
Again, I only wanted this to try and help. I do need them for anything else.
In your tests, do you restore your setup (e.g. addons etc) or did you also try running a fresh & empty installation? On all my tests I did a snapshot restore.
There are definitely installations out there which run stable with latest HAOS and RPi4 on a USB SSD, so it must be some software and/or hardware configuration. On the development builds for HA 6.0 I updated to the lastest Raspberry Pi Kernel and Firmware, maybe this helps. A new development build should be ready tomorrow. I have a build with that kernel/firmware now running on my Raspberry Pi 4 (4GB) using a Samsung T1 250GB connected via USB 3.0 running for more than 8 hours stable so far. I'll let it run for a while, to see if I can reproduce something.
Thanks. Currently running the March firmware.
Let me know what you want me to try and I install it and let you know.
OK, found this very interesting thread: https://bugzilla.redhat.com/show_bug.cgi?id=1230336
Basically the Startech adapters, which use ASMedia chips, may or may not work as ASMedia re-used device IDs with various chips. The kernel tries its best to figure out which one you have, but some work with UAS and some don't.
edit: I tried the following, but at least for my chip it crashed again - almost immediately. Need to disable UAS entirely.
* (*) ASM1051 chips do work with UAS with some disks (with the * US_FL_NO_REPORT_OPCODES quirk), but are broken with other disks
So anyway, my particular version of this bug is not a hassos issue - its an ASMedia issue with a workaround. Hopefully this helps others.
@jherby2k What version of the StarTech controller are you using? I have tried both the 3.0 and 3.1 (with latest firmware) both work fine with 5.4 and not on anything after. Never saw any of the errors that @agners mentioned. My crashes always happen several hours to several days later. I also read that people are having success with StarTech and the RPI 64 OS (which shows as Debian to the Supervisor).
Its the 3.0, but as per that link it seems like there were several different ASMedia chips that all show the same. I imagine Startech used several over the years.
Anyway, it hung again, so maybe it wasn't UAS after all. But it does seem to happen on Debian as well. @bschatzow can you try it also? https://community.home-assistant.io/t/installing-home-assistant-supervised-on-a-raspberry-pi-with-debian-10/247116. I had to run "echo reset-raspberrypi >/usr/share/initramfs-tools/modules.d/raspi-firmware.conf" before updating the system as per comments below. Otherwise it was very easy. Just didn't fix the problem!
@agners since Debian is a supported install path, is SSD boot on raspberry pi supported on Debian? Because i can repro on Debian.
@jherby2k I guess with Debian you mean Raspberry Pi OS? While RPi OS is based on Debian, its definitely not vanilla Debian. Especially the relevant components for this case, namely boot flow, kernel/device tree, are different.
Home Assistant OS is using the Kernel from the RPi OS project, but our boot flow is a bit different: We are using U-Boot to implement A/B partition, basically it allows to fallback to the older version in case an update fails. However, U-Boot should not really matter at normal runtime.
If you are using Raspberry Pi OS, and your experience hangs despite latest firmware/Kernel and original power supply, you should report the issue with the Raspberry Pi Kernel team at https://github.com/raspberrypi/linux/issues .
No, Debian 10 64-bit as per the link.
Since my wifi also cuts out when the issue occurs, i'm investigating USB3 / 2.4GHz interference. Gonna try USB2 and also shielding my adapter cable / SSD. (If you're curious, I have the Pi's wifi connected directly to my solar inverter so i can access its modbus interface. HA is accessible over ethernet)
@jherby2k I see, sorry for confusion. Do you happen to know what bootloader/kernel version those Debian images are using? I guess some upstream kernels?
I haven't heard of a case where WiFi interferes with USB 3.0 (the other way around yes, but that USB 3.0 is disrupted I am not aware of).
@bschatzow can you try and see if the latest nightly dev builds make a difference for you? Those come with the latest Raspberry Pi Firmware and Kernel.
https://os-builds.home-assistant.io/6.0.dev20210419/
If you use the built-in upgrade (via dev channel) note that from HAOS 6.0 you can only downgrade to 5.13. From 5.13 you'll be able to downgrade further, but you'll need to do the intermediate step. A fresh installation and restoring via Snapshot is probably easier.
@agners Just doing the update. Came back up at 6:38 PM EST. I'll let you know what I see.
@jherby2k > @bschatzow can you try it also? https://community.home-assistant.io/t/installing-home-assistant-supervised-on-a-raspberry-pi-with-debian-10/247116. I had to run "echo reset-raspberrypi >/usr/share/initramfs-tools/modules.d/raspi-firmware.conf" before updating the system as per comments below. Otherwise it was very easy. Just didn't fix the problem!
Going to try the dev version that @agners just posted to first.
@agners Dev is still up and running. Have an issue where CLI is not working correctly. Not sure if it is related to this or something else. Tried to see su logs and can't get there.
@jherby2k I see, sorry for confusion. Do you happen to know what bootloader/kernel version those Debian images are using? I guess some upstream kernels?
I haven't heard of a case where WiFi interferes with USB 3.0 (the other way around yes, but that USB 3.0 is disrupted I am not aware of).
@agners Kernel 5.5.10. Not sure about the bootloader.
Working very well for 24 hours now with the SSD attached via USB 2.0. That could be a bunch of things i suppose, but it supports my Wifi EM interference theory. I'll give it a few more days to be certain this is a definitive workaround, and then i'll test USB 3.0 again with some additional shielding around the Startech USB adapter.
@agners Crashed at 11:31. Did not make 24 hours. No logs on screen. Just says login. Not a good system for troubleshooting. Tried to go to 5.13 got Error raise form OTA Webserver: 404
@jherby2k the dev version froze after 17 hours. I am going to try the RPI OS 64 with HA supervised on my spare SSD. By the way have you tried 5.4? Never had an issue with it.
@jherby2k > Working very well for 24 hours now with the SSD attached via USB 2.0. That could be a bunch of things i suppose, but it supports my Wifi EM interference theory. I'll give it a few more days to be certain this is a definitive workaround, and then i'll test USB 3.0 again with some additional shielding around the Startech USB adapter. By the way another user tried the USB 2 (Bundai) and he got two days before he had the same freeze issues.
What I have read is the EMI issue is at the board (connector) that is why the recommendation on USB 2 is to get an extension cable to move it away from the USB3. Not sure how you can block the EMI at the board without doing a board mod?
@bschatzow ok thanks for testing. Yeah I still can't reproduce, my test installation is up 3 days almost on USB 3.0 Samsung USB 3.0 T1 SSD, rock solid.
When you are doing tests, is that with a productive set, configs, add-ons etc? Did you try an empty/vanilla installation?
@agners Try installing the Google Backup plugin. As I documented in other threads, I had no end of trouble with my PI4 and eventually went down the Debian 10 supervised install route which is working fine 4 weeks so far with no crash. I recently built a PI3 to do some power monitoring and used HASSIO just for fun. Built using an SD card it had MQTT, Grafana, Node Red, Samba plugins and about 100 MQTT sensors sending data and working perfectly and rock solid for 2 weeks. I then thought it was safe to try something else. Installed the Google Backup plugin and it died in less than 12 hours. I don't believe it is the Google Backup plugin that's the problem however because it runs fine on my PI4. I think the plugin must use something in the OS that causes the crash.
@agners > @bschatzow ok thanks for testing. Yeah I still can't reproduce, my test installation is up 3 days almost on USB 3.0 Samsung USB 3.0 T1 SSD, rock solid. I have tried the Samsung as well as the Kingston. No difference.
When you are doing tests, is that with a productive set, configs, add-ons etc? Did you try an empty/vanilla installation?
I am using my productive set with everything. As everything works with 5.4 don't think it would be a fair test to do it differently. When I get a chance I am going to try the supervised as others have changed to. Trying to decide whether to use a straight Debian install or the RPI OS? So far, no one has reported an issue with either Debian or RPI OS install. Interesting that @muzzak123 pointed to the Google Backup plugin. Others have also pointed to this. Not sure how this could cause the issue? I do have this as many others in the 1119 issue do. What is your opinion on this being an issue?
I also looked at the sensors after it came back up and noticed the processor temp was above normal (55 vs 46 C) and CPU use was much higher about 30 minutes prior to the freeze. This is different for me. Never saw this before. My snapshot setting is 3AM so this should not have been the issue.
Testing debian with same configuration. Started at 1:30 pm EST
I've been following these threads, I have had a problem with my PI3B+ crashing regularly since one of the last updates. Not lasting a day. I updated to the dev build 6.0 as per a few posts above and the Pi has been up for 2days 2hrs so far, so fingers crossed. Running on an SSD and only really a Conbee2 as integrations.
@agners I had issues with the Debian install. Not sure if it was operator error or something else. I was losing network connection. Changed to the RPI OS Debian version and no issues. I'll update if with status in a few days. Tried to understand the difference between the official Debian version vs the the RPI OS and the best I could find was a difference in the Kernel.
I still have my other SSD with HAOS and can test stuff with it if you need anything tested.
@agners , @pvizeli As a test I installed the RPI OS 64 beta and Home Assistant setup as Supervisor. I did a snapshot restore. This system has been stable and has not crashed since the install on Sunday. The best I can get with the HA OS above 5.4 is 19 hours. Again, this is the identical hardware and I used the identically snapshot that crashed HA OS Dev 6x. I'm willing to try and help to see what the issue is if I can get some directions on what to try. Thanks for reading.
@bschatzow
As everything works with 5.4 don't think it would be a fair test to do it differently.
Sure that is not fair, but it allows to isolate what could be the root cause. If an empty/minimal setup works fine on your end as well, then its pointless when I test that on my end. The question is what workload exactly triggers the issue.
@agners Not sure how this isolates anything. My system with everything installed worked for months with no issues. Also my test system with home assistant supervisor and everything installed on the RPI OS is up since Sunday with no issues.
@agners Some more data. Using the RPI OS and HA has worked for a week with no issues. I took it down and tried the Debian "March" release and I lost my network connection after several hours. This was similar to what what others had seen with the HA OS. Not sure if this helps, but something to think about.
@agners,
I see that you have released a new beta for testing. I will try it shorly.
Maybe you can help. I now have Debian and HA supervisor running successfully for almost a week. No issues once I removed the HDMI monitor.
When the monitor was attached I had similar issues to upgrading from HAOS5.4. Lock up after so many hours, 2 to 20.
Is it possible that something changed in the code that is using more power from the USB port? I am monitoring input power and I never see it above 1.2A (usually around .82). P/S is rated for 5V 3.5A.
I just purchased a USB powered hub. My testing plans are as follows:
@agners @pvizeli I have completed my testing and the following are my results.
Run Debian 24 hours to make sure there are no issues with the hub. Passed
Upgrade 5.4 to 5.13 and run for 24 hours. Failed in 3 hours
Not a PI4 issue. The PI works with identical setup with 5.4 or with Debian and HA as supervisor with the identical snapshot restored. My PI fails on any 5.4+. I did try a 6.x dev and it had the same freeze issue. If there is something you want me to test further I can with a spare SSD. For now I have switch to Debian as it is stable and HAOS is not (for me at least).
Working very well for 24 hours now with the SSD attached via USB 2.0. That could be a bunch of things i suppose, but it supports my Wifi EM interference theory. I'll give it a few more days to be certain this is a definitive workaround, and then i'll test USB 3.0 again with some additional shielding around the Startech USB adapter.
Just an update on my install - USB 2.0 on HASSOS 5.13 seems to be working pretty well for a couple of weeks. I've had the system hang once, but who knows if its related. I'm trying to ditch my wifi dependency and will see if disabling the radio helps with USB 3.0 in the near future.
@jherby2k My issue with system hangs is definitely not WiFi as my Pi4 is connected via Ethernet with no wifi setup. Like @bschatzow, my Pi4 hangs on the latest HAOS, but is running fine on 5.4.
Hardware Environment
Home Assistant OS release:
Journal logs:
Kernel logs:
Description of problem: Did a system restore on to an SD drive. Followed all the instructions and did a split of the SD for booting and SSD for everything else. System locked up in < 24 hours.