home-assistant / operating-system

:beginner: Home Assistant Operating System
Apache License 2.0
4.89k stars 971 forks source link

homeassistant crashing every day since an update in december #1232

Closed TrommlerHH closed 2 years ago

TrommlerHH commented 3 years ago

Hi,

also my raspberry pi with hass.io is crashing every days since december.

I already read a lot and posted in #1119 but I was advised to open a new post to get help.

I already reinstalled hass.io from the scratch and for almost two days it looked fine, but then it crashed again.

Any idea, what I could do?

This is my system:

System Health

version core-2021.2.3
installation_type Home Assistant OS
dev false
hassio true
docker true
virtualenv false
python_version 3.8.7
os_name Linux
os_version 5.4.83-v8
arch aarch64
timezone Europe/Berlin
Home Assistant Community Store GitHub API | ok -- | -- Github API Calls Remaining | 4964 Installed Version | 1.10.1 Stage | running Available Repositories | 747 Installed Repositories | 2
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | 21. Februar 2021, 1:00 relayer_connected | true remote_enabled | false remote_connected | false alexa_enabled | false google_enabled | true can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Hass.io host_os | Home Assistant OS 5.11 -- | -- update_channel | stable supervisor_version | supervisor-2021.02.9 docker_version | 19.03.13 disk_total | 28.6 GB disk_used | 4.5 GB healthy | true supported | true board | rpi4-64 supervisor_api | ok version_api | ok installed_addons | Samba share (9.3.0), File editor (5.2.0), Example (4.0.4), Google Assistant SDK (2.5.0)
Lovelace dashboards | 1 -- | -- resources | 0 views | 4 mode | storage
Wesley-Vos commented 3 years ago

Hi all,

I'm experiencing the same kind of problems on a Raspberry Pi 3 USB booting setup. Running latest updates of core, supervisor and OS.

The system becomes unresponsive after 2 to 24 hours after boot. Strangely, the HomeKit integration is still able to execute actions, however I can't access the UI. When it is frozen, I'm still able to ping the Pi but SSH is connecting really slow, but I don't get an error that the connection is refused, it just takes a long time to connect.

There does not seem to be a clear reason for the freeze, no extreme high RAM usage or something similar. It even happens when I'm not actively 'working' on the system, when it's just performing some background tasks.

I hope that the reason can be revealed. I'm more than happy to share any diagnostics and/or logs if that can help!

szurr commented 3 years ago

Hi i have similar problem system seems to working but was unavailable. I found that it was problem no my router side. If i rebooted router everything was back to normal. Now i changed router and it still occurs but like twice a year. Check your router first 👍🏼

agners commented 3 years ago

Hard to tell where this exactly origins. There are definitly installations which do work stable on Raspberry Pi with Release 5, so I don't think there is a general problem.

One of the more likely causes is out-of-memory situations. Make sure to monitor memory usage e.g. using the system monitor integration.

denperss commented 3 years ago

i do have the same issue, it overheads was working fine before... Now i have to turn it of and reboot..

olirav commented 3 years ago

I am having a similar issue, once or twice a day it the RPI3 will stop responding on both the standard and observer port, and the SSH will hang on the connecting stage. The issue usually resolves itself if left for half an hour or so.

When it recovers home-assistant.log only shows data from after the recovery - so no help there.

Plugging a monitor into the Pi when its froozen shows the below kernel messages that look to happen around the same time:

INFO: task kworker/0:0:30340 blocked for more than 122 seconds Tainted: G C 5.4.83-v7 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message INFO: task kworker/0:0:30340 blocked for more than 245 seconds Tainted: G C 5.4.83-v7 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message .............. Repeated untill .................... Out of memory: Killed process 3762 (python3) total-vm:1020288kB, anon-rss629156kB, file-rss0kB, shmem-rss:0kB, UID:0 pgtables1016:kB oom_score_adj:0 systemd[1]: systemd-resolved.service: Watchdog timeout (limit 3min)! systemd-coredump[32727]: Failed to get COMM: No such process

I am having a look at setting up the system monitor integration to check if this is a memory issue, but I suspect not

Edit: Before anyone comments about the power supply - this is running on a official Pi3 usb PSU, and the Pi Power integration doesn't show any issues in the logs.

TrommlerHH commented 3 years ago

Hard to tell where this exactly origins. There are definitly installations which do work stable on Raspberry Pi with Release 5, so I don't think there is a general problem.

One of the more likely causes is out-of-memory situations. Make sure to monitor memory usage e.g. using the system monitor integration.

After two days without crashes (no idea why) it crashed again last night.

I added the recommended system monitor before, but I have no idea what to do with that, to be honest.

Example configuration.yaml entry

sensor:

bschatzow commented 3 years ago

@olirav > When it recovers home-assistant.log only shows data from after the recovery - so no help there. You need to setup SSH on port 22222 or use a monitor and keyboard to see previous logs. Lots of directions on how to do this. If you get stuck, send me a message and I will try and help you.

olirav commented 3 years ago

@bschatzow Thanks for the tip, I have set up the developer SSH access (Hadn't come across that before, much more useful level of access for fault finding)

Checking the docker logs ($ docker logs homeassistant) shows one of my automations cointaining a loop, doing far more iterations than I would expect (Lots of log entries), I think it was somehow missing its until clause - I have reworked it for the time being to have a fixed number of iterations, which should eliminate that.

If that wasn't the issue, then it will likely crop up again in the next couple of days - I will keep an eye on it.

I have set up the system monitor with the setup below, which should let me see from the logs if it is running out of memory/swap or overheating the CPU. Maybe this will be of use for someone else.

-- configuration.yaml ---

sensor:
  - platform: systemmonitor
    resources:
      - type: memory_free
      - type: memory_use
      - type: swap_use
      - type: swap_free
      - type: load_1m
      - type: processor_use
      - type: processor_temperature

--- Sample Lovelace page to view logs ---

  - title: System Monitor
    path: system-monitor
    badges: []
    cards:
      - type: sensor
        entity: sensor.memory_use
        graph: line
        hours_to_show: 12
        detail: 2
      - type: entities
        entities:
          - entity: sensor.memory_use
          - entity: sensor.memory_free
          - entity: sensor.load_1m
          - entity: sensor.swap_free
          - entity: sensor.swap_use
          - entity: sensor.processor_temperature
      - type: sensor
        entity: sensor.load_1m
        graph: line
        hours_to_show: 12
        detail: 2
TrommlerHH commented 3 years ago

Last night - or rather early in the morning - it crashed again.

But there is nothing special to be seen in memory or disk usage: image

bschatzow commented 3 years ago

Have you tried os 5.4 or lower? Mine does the same on any version above but no freezes below.

Wesley-Vos commented 3 years ago

@bschatzow I've downgraded last week to 5.4. It only froze once while I was working on Grafana. Will see how it behaves over time, seems to be better for now.

TrommlerHH commented 3 years ago

@bschatzow Sorry, where do I find previous versions?

Wesley-Vos commented 3 years ago

@TrommlerHH ha os update --version=5.4 will do the job

TrommlerHH commented 3 years ago

Sorry for asking stupid questions. I am a GUI-User - I use lovelace. ;-)

Where do I have to enter that? In that terminal window?

image

Wesley-Vos commented 3 years ago

Yes! The terminal window is the right place. Please make sure to create a snapshot before you do this. I've read some comments in the other thread about this problem (#1119) that the downgrade broke the installation and a new one was needed. For me, there were no problems but since the kernel of your OS will be downgraded during this update, it can break and if it does you're screwed.

TrommlerHH commented 3 years ago

@Wesley-Vos Yes. No it does not start anymore. Do I now have to etch a SD-card with io 5.4? Where to get this?

Wesley-Vos commented 3 years ago

Before you etch a new one, try to connect a monitor to it and check and google for the error it shows during startup, maybe there is a quick fix for it. If not, you can find the earlier versions over here https://github.com/home-assistant/operating-system/releases

TrommlerHH commented 3 years ago

I do not have now monitor/cable with a micro-hdmi-plus.

So I etched the SD-card. But now even the installation from the scratch does not work anymore. After inserting the SD-card, I cannot reach homeassistant:8123 (What was no problem with earlier installations).

Did I use the wrong file to etch? .../releases/download/5.4/hassos_rpi4-64-5.4.img.gz

For 5.11 the file was this (ended with .xz instead of .gz): .../releases/download/5.11/hassos_rpi4-64-5.11.img.xz

bschatzow commented 3 years ago

I would just install the latest 64 bit image. After up and running and your snapshot restored then downgrade to 5.3 or 5.4.

jressel01 commented 3 years ago

hassos_rpi4-64-5.11 without problems on PI4 4GB

Home Assistant OS 5.11 supervisor-2021.02.11 core-2021.2.3

Rasberry pi4 4GB with 500gb Samsung SSD T5. Move from SD-Card to SSD Uptime since change 1week, 4 hrs, 10 min memory use between 26% and 34%

Wesley-Vos commented 3 years ago

It seems that there is a pattern. Every time I try to perform a memory heavy task such as compiling a binary file in ESPHome or perform a large delete statement on the Mariad database, the system freezes. When I restart the host and perform the same tasks, it works because the memory is not filled yet. This mades it much more likely to be a out of memory problem for my situation.

I'm running Home Assistant on a Raspberry pi 3 with only 1 GB Ram. Currently I'm running the 32-bit version, would upgrading (new install) to the 64-bit version make a difference? In addition to this, I'm running MariaDB and InfluxDB addons, is that maybe too much for this device? I also noticed that the swap memory usage is 100% all the time...

TrommlerHH commented 3 years ago

This does not fit to my situation, I guess.

But I am far away from deep technological knowledge.

In most of the cases the system crashes at night or very early in the morning.

And as I do not have anything happening with the homeassistant at night and no automations scheduled for the night, this idea might not fit for my pi-crash-situation.

Wesley-Vos commented 3 years ago

@TrommlerHH do you have the default recorder setup or did you minimize it? Every night at 4.12 it will run the purge job which will delete all data older than x days, this can be a heavy task if there are a lot of entities recorded. This can also explain why you have to restart it every morning.

TrommlerHH commented 3 years ago

I did not setup something special, as far as I remember. ;-) So I guess I work with the standard.

But looking at this, there are no heavy tasks during the night:

image

bschatzow commented 3 years ago

@Wesley-Vos > I'm running Home Assistant on a Raspberry pi 3 with only 1 GB Ram. Currently I'm running the 32-bit version, would upgrading (new install) to the 64-bit version make a difference? In addition to this, I'm running MariaDB and InfluxDB addons, is that maybe too much for this device? I also noticed that the swap memory usage is 100% all the time... You maybe pushing your Pi 3 past the limit. The 3 is no longer recommended for Home Assistant. Your freeze is completely different than mine. I have the Pi 4 with 4G memory and SSD with 120G. I have never seen a memory issue, processor issue, or a freeze at a certain time. Mine freezes after X hours on any update above 5.4.

agners commented 3 years ago

@TrommlerHH do you see something on the local console (via HDMI)?

denperss commented 3 years ago

Im struggling so much with the same issue, but for me it helped by the following...

Version core-2021.2.3 Newest Version core-2021.2.3

Version supervisor-2021.02.11 Newest Version supervisor-2021.02.11 Channel stable

Operating System Home Assistant OS 5.12

TrommlerHH commented 3 years ago

@TrommlerHH do you see something on the local console (via HDMI)?

As mentioned before I currently do not have the possibility to attach a monitor to the pi.

I now setup SSH-access.

But I installed 5.5 a few days before and since then it's running without issues.

soundslikeluke commented 3 years ago

I am having the exact same issue I guess I just need to what to see if updates will help. I thought it was an issue with my ssd but turns out I am not alone. I am running a Pi 3B+ on the latest version of everything 64 bit on a ssd. It is happening multiple times per day taking a long time to reboot

TrommlerHH commented 3 years ago

I am really frustrated!

For a few days my pi was running without problems on 5.5.

But then the every-day-crash-problem reoccurred.

I tried to downgrade to 5.4 but again the system did not restart after downgrading.

No access via samba or putty possible.

For a short moment this error message was shown in the browser: "unable to load the panel source: /api/hassio/app/entrypoint.js"

No idea, what to do.

hass.io seems to reject me ...

Wesley-Vos commented 3 years ago

Since I've downgraded to 5.4 and disabled some memory intensive plugins (like Influx) it is working really really stable.

About the entry point error, that does indicate that the observer/supervisor (not sure which of the two) is not responding. What do you see if you visit homeassistant_url:4357. This reports any problems with the observer, if present.

TrommlerHH commented 3 years ago

Even at :4357 no access ...

I do not have any special plugins.

soundslikeluke commented 3 years ago

My issue currently is no matter what I try and do the raspberry pi will now not even boot of a ssd. I was running the beta version and restated home assistant via the app and it never came back up. Then was the issue I wipped everything and started again but for some reason it won't boot anything from the SSD. I have tired multiple versions including the custom build suggested but it is not working on the SSD. I have tested that the pi is still set up to boot from USB and that worked. When it is loading the ethernet flashes but nothing appears on the screen and you cant ping the device so I don't think it is doing the initial install properly.

hmax-72 commented 3 years ago

Seems to be similar on Tinker Board S. I just installed the lates HASS.IO to my board and it crashed unpredictable (all leds off). My thought was, that there is an issue with the Hardware because CPU was getting VERY hot. But it is not! I installed default Tinker-OS and startet some stress test. It's running for 1 day at 100% CPU without any problem (temp. of CPU feels lower than before).

hmax-72 commented 3 years ago

Did a plain installation of hass.io 5.10 on my Tinkerboard S 12 hours ago - still running...

bschatzow commented 3 years ago

@soundslikeluke <My issue currently is no matter what I try and do the raspberry pi will now not even boot of a ssd. I was running the As you have confirmed that your pi still boots correctly from the usb it must be something else.

Do you have a monitor attached? Is the pi4 version 64 bit? I was told both work, but for me the 32 bit does not boot.

Can you try to install a different image then the 5.12?

Maybe your config.txt file got changed.Try updating the pi4 to the latest firmware. There is a forum on booting the pi4 from usb that goes through all the settings.

bschatzow commented 3 years ago

I am really frustrated!

For a few days my pi was running without problems on 5.5.

But then the every-day-crash-problem reoccurred.

I tried to downgrade to 5.4 but again the system did not restart after downgrading I had similar issues with 5.5. I had to do a wipe, reimage and a snapshot restore. Some of the updates change the config.txt file and some change the firmware. Wish I could help you further. I'm still stuck on 5.4. Nothing after this works more than a few hours.

an5t commented 3 years ago

I'm facing the same issue 2-3 times per month. Currently running Home Assistant OS 5.12 (but few previous versions also had the problem) on Raspberry Pi 4. Occasionally my Pi hangs up for no reason. At this moment it not responding over the network and gets very hot. If I connect monitor via HDMI, there is no video signal.

As long as I remember my installation was 32 bit. I have 4 GB RAM, but real consumption never exceeds 500 MB. Often it freezes at night, when no any heavy tasks running.

Снимок экрана в 2021-03-13 14-48-18

bschatzow commented 3 years ago

@mefuckin Did you ever try capturing the logs on the next boot via ssh 22222? Maybe of some use to the developers? So far there has been no log info from anyone that has helped.
The temperature that you show in your chart looks like it is due to the reboot not due to an issue. Your pi is about 15 degrees warmers than mine, but this should not be an issue as it is much lower that spec. I keep going back to 5.4 as this works for me with zero issues. When an update comes out, I try it and when (if) it fails, I capture the logs and go back to 5.4

an5t commented 3 years ago

@bschatzow Just activated 22222 ssh, had a look at system journal and saw nothing unusual at the moment before failure occurred. Also I'm sure increased temperature is due to an issue because at the moment before I reboot crashed pi it is more hot to touch than usually

bschatzow commented 3 years ago

Where you marked hangup, is where I assumed is where your pi froze? From the graph it looks like it flatlined for 11 hours at 56C. For me, it looks like there is no activity after around 3? You should be able to see this in the logs, no entries for a time until you reboot. I always try and capture with the following: journalctl --since "2021-03-13 03:00:00" --until "2021-03-13 04:00:00" >/mnt/data/supervisor/homeassistant/jour1.txt 2>&1 I usually have to tweak the time to capture just before to see if anything is happening (usually it is not). Time is UTC. File is saved in your config directory. Where you show reboot is where I see the temp spike.

an5t commented 3 years ago

Yes, pi frozen at 03:36. Log also ends at this time; there is no entries until I rebooted at 12:30. I'm running zigbee2mqtt addon which logs a few messages every minute, but at 03:36 logging stopped totally. Graph of any sensor in HA flatlined after this moment, not only temp sensor. Just after reboot HA read actual temperature, but it returns to normal soon

hmax-72 commented 3 years ago

Still running on 5.10 on my Tinkerboard S. Still everything fine. So, it seems to be NO Hardware-issue. there is some bug in 5.12

Delta1977 commented 3 years ago

I had the same with my Raspberry Pi 3. The original sized Swap was was filled over a few hours, cpu usage grows up . then the HA hangs. ( a few Times per Day) After i extend the Swap (https://community.home-assistant.io/t/how-to-increase-the-swap-file-size-on-home-assistant-os/272226) it runs very stable without crash and CPU is o.k. I think the default swap Size is to small.

image

as in buildroot-external/rootfs-overlay/usr/libexec/hassos-zram the Swap is 25% from Ram. it seams that HAOS want to use more swap as avaible with default and CPU had problems with Swapping/compress. thand HA becomes unresponsible.

byte4geek commented 3 years ago

My issue occour after 4-7 days. I cant access to Supervisor but i can connect via ssh to the host on 22222 port and reboot, after this all goes fine for another 4-7 days but some time i completely lost the system with no ping no connection, only the power toggle restore the entire system. I have moved back from SSD to mSD on RPi4 4G and the issue disappear. I leave the mSD for now, my HA manage all things of my house, like light, heating ,alarm and so on, i can't permit me to lost it.

agners commented 3 years ago

@hmax-72 did you try OS 5.11? In general there is not that much change between 5.10 and 5.12, mostly newer stable Linux kernels, new systemd stable release and such...

@Delta1977 out of memory situation can cause all kinds of freezes and problems. Definitely keep an eye out on memory usage. Adding swap might help, but usually makes the system slower. By default HAOS does not use swap on regular block devices, since that can wear out SD cards and other flash based storage.. If you are fine and it works for your case, that is fine. But in general I'd recommend going for a board with more memory if memory is the issue...

chenchen119 commented 3 years ago

I upgraded two Tinker Boards (non S) to 5.12 and both just started to crash without warning. One of them did show high temperature 90ºC warning on HDMI. Out of ideas, I installed a fresh 5.12 on one of them and whenever I access http://homeassistant.local:8123/hassio, web UI stops working and ping is unstable (with a lot of dropped packets) eventually crashing completely.

Edit: upgrade was from a really old OS version, I recall it was 2.XX. Just tried OS 5.10 and 5.11, same issue. Tried a different SD card with same issue.

Tried OS 4.11, now I am able to access supervisor. However, now I am stuck on supervisor 228 and not able to update...

Edit2: OS 5.8 same issue. Something is definitely wrong with all OS 5.X for Tinker Board. OS 4.19 has booting issues. OS 4.16 seems to be stable and supervisor is now up-to-date.

hmax-72 commented 3 years ago

@agners I'm new to hass.io: how do I upgrade not to the most actual version (5.11)? Do a installation from scratch and restore everything?

AngelusGi commented 3 years ago

same here. I'm using a RPI 3 rev B. After about 2 hours it crashes in this state as shown in the screenshot attached.


this is from version 5.12 (64bit)

image


Update 1

same issue on version 6.0.dev20210322 (64bit)

image

agners commented 3 years ago

@chenchen119

I upgraded two Tinker Boards (non S) to 5.12 and both just started to crash without warning. One of them did show high temperature 90ºC warning on HDMI.

This issue is about Raspberry Pi. Can you open a new issue?