hacs / integration

HACS gives you a powerful UI to handle downloads of all your custom needs.
https://hacs.xyz
MIT License
5.37k stars 1.28k forks source link

hacs caused hass to go 100% unresponsive : hacs 0.13.2 + hass 0.97.2 #472

Closed bbrendon closed 5 years ago

bbrendon commented 5 years ago

Around 1pm today I clicked on something in hacs and I started to get bad gateway from nginx. I think I clicked on "Settings -> Reload Data" that caused it but I'm not really sure. I was on mobile looking to see if there were any updates available, and there weren't.

There was nothing related in home-assistant.log

When I got home, the system was down. I had to break in. I restarted the hass process around 17:00. You can see in the log there nothing between 13:00 and when it was restarted.

At 13:01 502's started appearing in the nginx logs. If someone wants them, I can send them privately.

2019-08-22 12:17:36 ERROR (MainThread) [custom_components.weatheralerts.sensor] list index out of range
2019-08-22 12:27:36 ERROR (MainThread) [custom_components.weatheralerts.sensor] list index out of range
2019-08-22 12:37:37 ERROR (MainThread) [custom_components.weatheralerts.sensor] list index out of range
2019-08-22 12:41:00 DEBUG (SyncWorker_2) [homeassistant.components.proximity] proximity.homePERSON update entity: distance=3: direction=away_from: device=PERSON’s iPhone
2019-08-22 12:41:00 INFO (SyncWorker_2) [homeassistant.components.proximity] PERSON’s iPhone: proximity calculation complete
2019-08-22 12:41:00 DEBUG (SyncWorker_6) [homeassistant.components.proximity] proximity.workPERSON update entity: distance=2: direction=away_from: device=PERSON’s iPhone
2019-08-22 12:41:00 INFO (SyncWorker_6) [homeassistant.components.proximity] PERSON’s iPhone: proximity calculation complete
2019-08-22 12:41:00 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection.140631696760616] Error handling message: Unauthorized
Traceback (most recent call last):
  File "/opt/python-venvs/hass/lib/python3.7/site-packages/homeassistant/components/websocket_api/connection.py", line 91, in async_handle
    handler(self.hass, self, schema(msg))
  File "/opt/python-venvs/hass/lib/python3.7/site-packages/homeassistant/components/websocket_api/commands.py", line 48, in handle_subscribe_events
    raise Unauthorized
homeassistant.exceptions.Unauthorized: Unauthorized
2019-08-22 12:41:01 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection.140631729542368] Error handling message: Unauthorized
Traceback (most recent call last):
  File "/opt/python-venvs/hass/lib/python3.7/site-packages/homeassistant/components/websocket_api/connection.py", line 91, in async_handle
    handler(self.hass, self, schema(msg))
  File "/opt/python-venvs/hass/lib/python3.7/site-packages/homeassistant/components/websocket_api/commands.py", line 48, in handle_subscribe_events
    raise Unauthorized
homeassistant.exceptions.Unauthorized: Unauthorized
2019-08-22 12:42:17 ERROR (MainThread) [homeassistant.core] Error doing job: SSL error in data received
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/usr/lib/python3.7/asyncio/sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "/usr/lib/python3.7/ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609)
2019-08-22 12:42:18 ERROR (MainThread) [homeassistant.components.iqvia] Unable to get allergy_outlook data: Error requesting data from https://www.pollen.com/api/forecast/outlook: 500, message='Internal Server Error'
2019-08-22 12:46:00 DEBUG (SyncWorker_15) [homeassistant.components.proximity] proximity.workPERSON update entity: distance=4: direction=away_from: device=PERSON’s iPhone
2019-08-22 12:46:00 DEBUG (SyncWorker_8) [homeassistant.components.proximity] proximity.homePERSON update entity: distance=2: direction=towards: device=PERSON’s iPhone
2019-08-22 12:46:00 INFO (SyncWorker_15) [homeassistant.components.proximity] PERSON’s iPhone: proximity calculation complete
2019-08-22 12:46:00 INFO (SyncWorker_8) [homeassistant.components.proximity] PERSON’s iPhone: proximity calculation complete
2019-08-22 12:47:38 ERROR (MainThread) [custom_components.weatheralerts.sensor] list index out of range
2019-08-22 12:51:11 DEBUG (SyncWorker_3) [homeassistant.components.proximity] proximity.homePERSON update entity: distance=2: direction=away_from: device=PERSON’s iPhone
2019-08-22 12:51:11 DEBUG (SyncWorker_4) [homeassistant.components.proximity] proximity.workPERSON update entity: distance=5: direction=away_from: device=PERSON’s iPhone
2019-08-22 12:51:11 INFO (SyncWorker_3) [homeassistant.components.proximity] PERSON’s iPhone: proximity calculation complete
2019-08-22 12:51:11 INFO (SyncWorker_4) [homeassistant.components.proximity] PERSON’s iPhone: proximity calculation complete
2019-08-22 12:57:39 ERROR (MainThread) [custom_components.weatheralerts.sensor] list index out of range
bbrendon commented 5 years ago

I remember now... I checked hacs because I was going through my email and saw something about alexa_media_player being updated so I think that's why I went into hacs on mobile. I almost never click on it when I'm not home but today I did.

After getting my system stable again, hacs now says there is an update to alexa_media_player for 2.0.0.

At 13:00 the hass process went from using ~2.5 CPU on average to 0. hass process was running for 3 days and 20 hours until I restarted it at 17:00. Also hass memory usage dropped about 100mb at 13:00.

ludeeus commented 5 years ago

This is weird...

NGINX Logs will not tell anything. From what I can tell it looks like it shut down a bit earlier than 13:01

2019-08-22 12:42:17 ERROR (MainThread) [homeassistant.core] Error doing job: SSL error in data received
Traceback (most recent call last):
  File "/usr/lib/python3.7/asyncio/sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/usr/lib/python3.7/asyncio/sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "/usr/lib/python3.7/ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609)

This can take down HA.

Before that, I also see some issues with authentication and the WebSocket connection.

I can not see a single indication that HACS have done anything on your system here. When the "Reload data" button is clicked, it will generate an INFO message for all elements it tries to update before it starts to do so.

bbrendon commented 5 years ago

I assumed that error was one of the many errors that get ignored. I didn't realize it was important.

Strange that hass failed. First time I've experienced it.

Thanks for having a look. Fingers crossed it doesn't happen again.

bbrendon commented 5 years ago

Also, I just looked at the logs again. That errors appears many many times (141 times as of the current restart ). I'm not exactly sure what is causing it.

bbrendon commented 5 years ago

Regarding the ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2609). This seems to be right. https://github.com/home-assistant/home-assistant/issues/21270#issuecomment-519331437

ludeeus commented 5 years ago

Judging from that issue, this has been going on since before I even started on HACS. I'm closing this now since I believe that this is out of my hands.

If this happens for you again, and you have logs that indicate that HACS is at fault please open a new issue.