home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.44k stars 920 forks source link

HASS unstable #1119

Closed sidwin9 closed 2 years ago

sidwin9 commented 3 years ago

The problem

Since the 2020.12.0 update and the 2020.12.7 Supervisor update, my Hass has been drinking 1 or 2 a day and the interface is no longer accessible either locally or remotely. I have to make an electric power stop to restore. The 2020.12.1 version does not solve my problem

Environment

Problem-relevant configuration.yaml

Traceback/Error logs

Additional information

Update 2021-11-05

Google Docs Spreadsheet tracking various configurations which exhibit similar behavior:

mmatloka commented 3 years ago

I am seeing the same problems. Additionally I've found this thread: https://www.reddit.com/r/homeassistant/comments/kf41se/ha_server_goes_awol_witout_reason/

SBRK commented 3 years ago

OP from the Reddit thread listed above here. I've started having this issue in the last few days, although I did the update a while ago (2 or 3 weeks) and had not had any problems so far. So I'm not sure if it's related to the latest version

Edit: I was not aware that the supervisor had its own version and had updates of its own, so yeah. I'm on 2020.12.7 too

phrozen77 commented 3 years ago

Can’t offer much more than a „me too“ - after upgrading to 2020.12.0 and .1 subsequently, my Pi also was unresponsive and had to be unplugged and plugged in again...

mmatloka commented 3 years ago

Can’t offer much more than a „me too“ - after upgrading to 2020.12.0 and .1 subsequently, my Pi also was unresponsive and had to be unplugged and plugged in again...

Have you upgraded also the hass os?

sidwin9 commented 3 years ago

Hey mmatloka,

This is exactly the problem described in https://www.reddit.com/r/homeassistant/comments/kf41se/ha_server_goes_awol_witout_reason/ I wonder if the problem is not due to the Hass os 5.8 update...

mmatloka commented 3 years ago

I'm wondering the same thing https://github.com/home-assistant/operating-system/issues/1096 . I did the OS downgrade ha os update --version 4.20 to check that, waiting to see the effects.

liudger commented 3 years ago

Hey mmatloka,

This is exactly the problem described in https://www.reddit.com/r/homeassistant/comments/kf41se/ha_server_goes_awol_witout_reason/ I wonder if the problem is not due to the Hass os 5.8 update...

My problems also started with Hass OS 5.8 update.

phrozen77 commented 3 years ago

Can’t offer much more than a „me too“ - after upgrading to 2020.12.0 and .1 subsequently, my Pi also was unresponsive and had to be unplugged and plugged in again...

Have you upgraded also the hass os?

Yes, i did - also at basically the same time i did the 2020.12.0 update, so i can't quite rule out either :(

liudger commented 3 years ago

5.9 is about to release. I'll wait for the image to be ready and will test this version.

(edit) 5.9 is online I am running it now. Lets see if it fails https://github.com/home-assistant/operating-system/releases/tag/5.9

SBRK commented 3 years ago

Happened to me again today. I plugged a monitor before hard rebooting, here's what was displayed:

PXL_20201218_201652799

Gawronnek commented 3 years ago

I am seeing the same problems. I did the OS downgrade ha os update --version 4.20 to check that, waiting to see the effects.

liudger commented 3 years ago

I have updated to 5.9 and the first impression is good(no usual hangs at startup). Still need an hour to see if it will be fine

edit: or less than an hour :(

Schermafbeelding 2020-12-18 om 22 00 21

core ram usage spikes when it happens

SBRK commented 3 years ago

Just to be clear, I'm not running the latest versions of Core or OS. The only thing that seemingly recently changed is the Supervisor

Maybe it's due to the docker version change ? https://github.com/home-assistant/supervisor/compare/2020.12.6...2020.12.7

TuomasPakkanen commented 3 years ago

I'm also having similar issues after upgrading from 5.2 beta to 5.8 and 2020.12. Running rasbpi4 4gb with external SSD. HA seems to work normally for a day or two, then it crashes and I have to powercycle. The pi does get quite hot too when it has been crashed for a few hours. According to system sensors, nothing out of the ordinary is happening before the crash like higher CPU or RAM usage. It just happened during last night, so not much going on otherwise either. Have Conbee II stick also plugged in, I think in some other thread someone was wondering if the USB devices might be causing issues.

Maybe I'll wait and see if the 5.9 will fix the issues, but it's not showing yet in the GUI to upgrade.

Edit: not sure if this is also relevant: https://github.com/home-assistant/operating-system/issues/1045

Edit2: happened again. Pi didn't respond to ping or anything forcing a powercycle. Plugged a monitor in and it didn't show anything during boot, not sure if it should.

liudger commented 3 years ago

Happened to me again today. I plugged a monitor before hard rebooting, here's what was displayed:

PXL_20201218_201652799

You are on raspberry pi? with usb drive attached? Or is it an odroid n2? with mmc

I am running 4.20 now. It looks promising

edit: Still running fine for a few hours already.

SBRK commented 3 years ago

You are on raspberry pi? with usb drive attached? Or is it an odroid n2? with mmc

I am running 4.20 now. It looks promising

Yes, Pi 3, and I'm only usind an SD Card. Some people on Reddit suggested that my SD card might be worn out.

oyta commented 3 years ago

The problem

The same problem as the original reporter. My RPI4 running a few days after the 2020.12.0 upate, then I cannot access the frontend. RPI gets very hot, power off and on with the power coord gets it up and running again.

Environment

System Health

version 2020.12.1
installation_type Home Assistant OS
dev false
hassio true
docker true
virtualenv false
python_version 3.8.6
os_name Linux
os_version 5.4.79-v7l
arch armv7l
timezone Europe/Oslo
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | 21 December 2020, 1:00 relayer_connected | true remote_enabled | true remote_connected | true alexa_enabled | true google_enabled | true can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 5.8 -- | -- update_channel | stable supervisor_version | 2020.12.7 docker_version | 19.03.13 disk_total | 28.6 GB disk_used | 5.5 GB healthy | true supported | true board | rpi4 supervisor_api | ok version_api | ok installed_addons | Samba share (9.3.0), File editor (5.1.0), Mosquitto broker (5.1), Terminal & SSH (8.9.1)
Lovelace dashboards | 1 -- | -- mode | storage views | 9 resources | 0

Problem-relevant configuration.yaml

Traceback/Error logs

Logs were deleted on power cycle, so no logs from the crash were found on the HA UI.

Additional information

Gawronnek commented 3 years ago

I returned to HassOs 4.20, over 24 hours without problems, continues to monitor the situation. Unfortunately, I can't go back to the earlier versions of Supervisor, but maybe it will be fine.

Wersja interfejsu użytkownika: 20201212.0 - latest Kondycja systemu Wersja 2020.12.1 Typ instalacji Home Assistant OS Wersja deweloperska false Supervisor true Docker true Środowisko wirtualne false Wersja Pythona 3.8.6 Rodzina systemu operacyjnego Linux Wersja systemu operacyjnego 4.19.127-v7 Architektura procesora armv7l Strefa czasowa Europe/Warsaw Home Assistant Community Store GitHub API ok Github API Calls Remaining 4964 Installed Version 1.9.0 Stage running Available Repositories 702 Installed Repositories 9 AccuWeather Reach AccuWeather server ok Remaining allowed requests 48 Airly Dostęp do serwera ok Home Assistant Cloud Zalogowany true Wygaśnięcie subskrypcji 28 grudnia 2020, 1:00 Relayer podłączony true Zdalny dostęp włączony false Zdalny dostęp podłączony false Alexa włączona false Asystent Google włączony true Dostęp do serwera certyfikatów ok Dostęp do serwera uwierzytelniania ok Dostęp do chmury Home Assistant Hass.io System operacyjny hosta HassOS 4.20 Kanał aktualizacji stable Wersja Supervisora 2020.12.7 Wersja Dockera 19.03.12 Pojemność dysku 109.3 GB Pojemność użyta 11.1 GB Zdrowy true Wspierany true Układ rpi3 API Supervisora ok Wersja API ok Zainstalowane dodatki Samba share (9.3.0), Home Assistant Google Drive Backup (0.102.0), File editor (5.2.0), Duck DNS (1.12.4), Glances (0.9.1), Check Home Assistant configuration (3.6.0), deCONZ (6.6.1), Network UPS Tools (0.3.1), WireGuard (0.4.0),

Edit: Next day no problem, all ok

bschatzow commented 3 years ago

Mine is unstable with an SSD but no issues with the SD memory card. Get 5 to 10 hours using SSD before it locks up.

TuomasPakkanen commented 3 years ago

Downgrading back to OS version 5.2 beta seems to have solved the issue for me. At least for now it'll do.

HumanSkunk commented 3 years ago

Same issue here. I thought it was just me. I have now run the downgrade to 5.6 beta as i am running an SSD. It worked perfectly until updating to the latest version. Hopefully this gets resolved. Just for good measure i have rolled back my supervisor back to 0.118.5 too.

HumanSkunk commented 3 years ago

Happened to me again today. I plugged a monitor before hard rebooting, here's what was displayed: PXL_20201218_201652799

You are on raspberry pi? with usb drive attached? Or is it an odroid n2? with mmc

I am running 4.20 now. It looks promising

edit: Still running fine for a few hours already.

Are you still running fine? I am hoping its HassOS, but i see others are reporting that it could be the Supervisor version.

liudger commented 3 years ago

Are you still running fine? I am hoping its HassOS, but i see others are reporting that it could be the Supervisor version.

Yes it still runs smooth. In another thread they identified something with the firmware that shipped with the OS 5.8 and 5.9.

HumanSkunk commented 3 years ago

That great news for you. Been scratching my head as to what the cause was. Annoyingly you cant see any historical logs, I dont know why Home Assistant destroys the log file on reboot. I am now rebuilding part of my system as I made a number of changes to automations this week. Current issue is getting Home Assistant Cloud working again as I cant get a certificate since restoring from backup… but thats a minor issue. On 21 Dec 2020, 09:19 +0000, Willem-Jan notifications@github.com, wrote:

Are you still running fine? I am hoping its HassOS, but i see others are reporting that it could be the Supervisor version. Yes it still runs smooth. In another thread they identified something with the firmware that shipped with the OS 5.8 and 5.9. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

cogneato commented 3 years ago

@HumanSkunk journalctl -b -1 -u hassos-supervisor for supervisor logs of the previous boot

journalctl -b 1 for the host logs of the previous boot

[edit]: fixed typos in the commands

These need to be run on the host, and not from an add-on.

https://wiki.archlinux.org/index.php/Systemd/Journal#Filtering_output

Gawronnek commented 3 years ago

I uploaded the 5.9 version in the evening, unfortunately in the morning I already had the first system overhang, it was necessary to disconnect the power supply and reconnect, unfortunately I return to version 4.2 again

bschatzow commented 3 years ago

Mind froze after 10 hours. TuomasPakkanen states that 5.2beta works for him and all after this fails. If I can figure out how to install this version I will and report back the status.

SBRK commented 3 years ago

Weirdly, I haven't had any crash since I updated to OS 5.8 and Core 2020.12.1 (4 days ago)

HumanSkunk commented 3 years ago

I currently moved back to 5.6 HassOS (moved from 5.8) and core 0.118.5 from 2020.12.1, and all working stably at the moment. Fingers crossed it all stays up! On 23 Dec 2020, 11:45 +0000, Benjamin MICHEL notifications@github.com, wrote:

Weirdly, I haven't had any crash since I updated to OS 5.8 and Core 2020.12.1 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

danielsjf commented 3 years ago

Same issue here after upgrading to 5.8. Subsequently upgrading to 5.9 doesn't fix the issue.

To add something on the discussion, the upgrade seems lan related. For some reason after upgrading I also had a couple of other devices complaining about dhcp errors. When I unplug the pi, the problems disappear.

Also, while hassos is nowhere to be found on the network (pinging doesn't work), zwave and zigbee continue to work. Sensors that only use those still respond ad they should.

Power cycling the pi solves all issues for a day or so.

Minius90 commented 3 years ago

Same issue here: Have to unplug the pi every day. 5.9 did not fix the issue

HumanSkunk commented 3 years ago

Same issue here after upgrading to 5.8. Subsequently upgrading to 5.9 doesn't fix the issue.

To add something on the discussion, the upgrade seems lan related. For some reason after upgrading I also had a couple of other devices complaining about dhcp errors. When I unplug the pi, the problems disappear.

Also, while hassos is nowhere to be found on the network (pinging doesn't work), zwave and zigbee continue to work. Sensors that only use those still respond ad they should.

Power cycling the pi solves all issues for a day or so.

Yes when the issue happened for me all my devices were unable to connect to the internet. Sonos/Alexa would all think they had connectivity but were unable to resolve connectivity. When I logged into my router it would also tell me that I had no WAN connectivity. As soon as I unplugged or hard reset the Pi, everything came back instantly.

I am currently remote from home and I am now unable to remotely VPN into my router or home assistant. My hope it is just an internet issue, but my suspect is that it is home assistant related. I hoped that rolling back would have fixed it, but looks like it hasn’t.

HumanSkunk commented 3 years ago

So another update from me, as I have recently installed some DIY blind motors I was concerned that something had gone wrong so I went home to investigate, low and behold Home Assistant has crashed exactly as before. Taken my internet down again with it. I could still connect to my router locally and it was reporting no internet connectivity as before, the second I unplugged it... internet restored. I moved back to OS 5.6 and Home Assistant version 0.118.5 with supervisor 2020.12.7, but this combo is still causing crashes. Could it be a supervisor issue? I am unsure how to roll that back to a previous version.

@cogneato I am unable to review the previous boot logs as the journald command doesnt work on my Pi, via ssh on the host. I get the error:

-bash: journald: command not found

I am going to try rolling back the OS to 5.3 (which I think it what it was for ages and was stable on my Pi4 when I converted to using an SSD for an SD card. Fingers crossed. Else I guess a complete rebuild may be needed, which I really dont want to do.

Minius90 commented 3 years ago

Stepped back to 5.7, hoping daily need for unplugging my pi is "solved" this way

danielsjf commented 3 years ago

Roled back to 4.20 since three days and all my lan devices are working properly and HA hasn't gone offline since.

matejzero commented 3 years ago

Another case here. After upgrading to 5.8 or 5.9, don’t remember 100%, I had a rpi freeze every night. Downgraded to 4.10 and we have a smooth sailing ever since.

bschatzow commented 3 years ago

I went to 5.2 and up 8 days. 5.8 or 5.9 froze daily

kpcz commented 3 years ago

Same problem. After HassOS upgrade from 4.20 to 5.9, my HA has already froze twice in 3 hours. Even ping is not going through. In logs no traces of any issues. Before that everything was working like Swiss watch. :) RPi4 + SDcard. Rolling back to 4.20 hoping that it will resolve the issue for now. Thanks.

agners commented 3 years ago

This is all likely connected with #1129.

BarBaar44 commented 3 years ago

Same here on an Odroid C2 with both 5.8 and 5.9

The machine randomly crashes within 24h after boot. Filesystem goes to read-only.

Now back on 4.2 and everything seems OK again..

HumanSkunk commented 3 years ago

Upon moving back to 5.3 almost a week ago, my instance has remained up. This was the version I installed fresh when I updated to a Pi 4 with an SSD. Unsure why all later versions cause the Pi to lock up and get super hot, but i'll take this as a solution for now.

cogneato commented 3 years ago

For some, I believe it is this issue: https://github.com/home-assistant/operating-system/pull/1121

bcutter commented 3 years ago

Seeing this as many other users (exactly: https://community.home-assistant.io/t/hass-os-becomes-unresponsive-then-almost-unusable-and-is-finally-dead-starting-every-10-hours-after-last-ha-start/261800) as well.

My system is heavily unstable and almost unusable. Started few days ago I think when I integraded InfluxDB and downsized recorder database (SQLite). HA only ran about 10 hours, today I saw it running for only 3 hours.

What a nightmare! Any advices?

Did you regain a stable system with running an earlier HASS OS version? Feedback please @HumanSkunk @BarBaar44 @kpcz @matejzero @Minius90 @bschatzow @danielsjf.

I will give fresh 5.10 a try (https://github.com/home-assistant/operating-system/releases/tag/5.10), it fixes other issues like https://github.com/home-assistant/operating-system/issues/1129.

bschatzow commented 3 years ago

Mine is very stable on 5.2. 5.8 and 5.9 froze after less than 10 hours. I downgraded to 5.2 from cli and all is well again.

HumanSkunk commented 3 years ago

For me 5.3 works perfectly. It crashed on Christmas Eve running 5.6, but since the rollback to 5.3 it has remained operational for a week. I have my fingers crossed for longer.

Minius90 commented 3 years ago

Went back to 5.7 some days ago, seems pretty stable. Only had issues today after upgrading core to 2020-12-2. Propably needs some further testing

hughhallhh56 commented 3 years ago

Running OS version 4.16 and Core version 0.117.6. I have been having this issue for more than a month. Sometimes goes a few days, other times less than 12 hours until system become non responsive. Was able to catch it a few times before it completely hung and found the load average (using top command) to be 18 to 25 instead of the normal .5. Within another hour it was completely hung.

I unloaded a few add ons that were large memory users such as the unifi controller since I was using a lot of the available memory. I was going to upgrade OS and Core, but in searching for what was causing my issues I came across all these posts describing the same issue so have not updated anything further. System has been great for 24 hours now so will continue to watch it.

bcutter commented 3 years ago

18 to 25, hah, see one of the last snapshots before my system went to death:

grafik

So many affected users - where are the developers or debugging experts by the way? I thought that´s one big advantage of using a supported version, otherwise I could have gone with HA Supervised on a Raspberry Pi OS which would give me much more insights what´s going on on the system as I can check now on HASS OS. Looking forward... andy happy new year by the way to everybody! :-)

bcutter commented 3 years ago

Still on HASS OS 5.8. I have no idea where that massive peak came from and WHY the system survived it (~ 12 hours after last boot): grafik grafik grafik

Actually the whole system ran for almost 24 hours - first time for days. Only planned HA restart after ~ 16 hours after grafik

Only thing I changed yesterday: stopped all HA supervisor addons not urgently needed to free some RAM. That way system RAM usage stayed below 80 % most of the time.

Everything only assumptions, don´t have more to offer unfortunately.

fernplant commented 3 years ago

Same issue here after upgrading to 5.8. Subsequently upgrading to 5.9 doesn't fix the issue.

To add something on the discussion, the upgrade seems lan related. For some reason after upgrading I also had a couple of other devices complaining about dhcp errors. When I unplug the pi, the problems disappear.

Also, while hassos is nowhere to be found on the network (pinging doesn't work), zwave and zigbee continue to work. Sensors that only use those still respond ad they should.

Power cycling the pi solves all issues for a day or so.

Am also having this issue on RPi 4 w/ SSD and on mine, when it locks up, the network activity light goes bananas blinking like crazy. So I would say yes, LAN related seems likely