Closed chneau closed 3 years ago
@mdeweerd just to be clear, you write that "I have an unreachable HA system". Did it not start working again after a manual restart?
It's remote from my current location. I do not often go there, the idea is to control some stuff remotely, like heating the place up before I go there.
I can't connect(login) to the HA setup : the Web UI is not working and I have not set up the SSH connectivity yet .
(regarding the watchdog) I got it. But at the same time other needs should be full-filled, in order to back-trace the issue. Currently even HA logs are reset by each restart. Keep in mind that HA devs are not prone to develop anything without detailed issue description. I could imagine the DST issue would never get traction if all impacted instances have silently restarted itself.
HA devs added the possibility to correct historical data in Developer Tools>Statistics . As far as I understood from the release video for 2021.10 they saw a lot of discussion providing SQL Queries to users on how to do that, and that is not what they want. I can't speak for them, but I think the goal is that User of HA do not need to be specialists. Having access to the logs already requires quite some configuration in itself (file editor, terminal or ssh access).
I think it is better to keep a system going rather than having it out of service until the user notices it and can intervene.
The members of my family will quickly request to remove all that domotic s**t and come back to plain old household equipement management. It's already difficult to introduce it.
Home Assistant log files are rotated, so in principle you can find the one before the restart.
A notification could inform about the restart and suggest to make some report about it by providing the relevant log file.
The supervisor could also log information somehow about the restart reason.
I sure that supervision will be based on options and not everbody has a supervisor (I would be setting up monit if I weren't using HA OS).
For now, I'll be going to a cold place that requires about 1 hour to heat up 😞 .
Is the data gap between 1AM and 2AM been fixed?
This is most likely a side effect of the recorder being overrun; the hourly statistics for 01:00~02:00 is compiled after 02:00 when things had already gone south.
Improving the supervisor to detect this kind of problem makes sense. Please open an issue here: https://github.com/home-assistant/supervisor/issues
Well, the first time it's 2 AM, the clock has not been shifted back yet, the problem presumably occurs when the clock shifts back from 3 AM to 2 AM. Or, is the fix explaining that the issue already happens at the first time it's 2AM (when the statistics for 1 AM to 2 AM are collected, not for 2 AM to 3 AM becoming 2 AM).
Is the data gap between 1AM and 2AM been fixed?
This is most likely a side effect of the recorder being overrun; the hourly statistics for 01:00~02:00 is compiled after 02:00 when things had already gone south. Improving the supervisor to detect this kind of problem makes sense. Please open an issue here: https://github.com/home-assistant/supervisor/issues
Well, the first time it's 2 AM, the clock has not been shifted back yet, the problem presumably occurs when the clock shifts back from 3 AM to 2 AM. Or, is the fix explaining that the issue already happens at the first time it's 2AM (when the statistics for 1 AM to 2 AM are collected, not for 2 AM to 3 AM becoming 2 AM).
I think, the issue can be described as the following:
the system is generating statistics (short term and long term) during 02:00 and 03:00 - and when the clock jumps back to 02:00 there are already statistivs available for that period... (and probably stats going into the future for the system)...
the system is generating statistics (short term and long term) during 02:00 and 03:00 - and when the clock jumps back to 02:00 there are already statistics available for that period... (and probably stats going into the future for the system)...
That's not the case, statistics' timestamps are in UTC, not local time. There might be something else going on here, possibly due to a frontend bug, could some of you with a hole in the statistics as a result of this bug please share a dump of the statistics tables, let's say October 31st 00:00 ~ October 31st 04:00 local time?
FWIW, this will allow the supervisor to check the health of the recorder: https://github.com/home-assistant/core/pull/58989
I could imagine the DST issue would never get traction if all impacted instances have silently restarted itself.
A silent restart once a year would be unwanted but not blocking IMHO
As I am setting up SSH on the system that I could not access remotely, I discover that the add-on allows installation of 'apks'.
I added monit
and it's added. Then I added nmap
to scan addresses, added as well ☺️ .
So that should allow me to:
curl https://public.ha.dnsnam/
works,curl http://192.168.5.66
which is my local IP is refused,curl http[s]//172.30.33.2
which is the nginx IP is not giving anything usefull.So I'll be doing some 'poor mans monitoring' on my systems 😄 . Once done, I'll share my configuration on the forums.
Home Assistant log files are rotated, so in principle you can find the one before the restart.
Well they never appear in the UI to look at apart from the .1 log.
It needs both Core and Supervisor logs to be available as well and not just the last one, several. All my other systems have at least a week's worth of logs available.
As I am setting up SSH on the system that I could not access remotely, I discover that the add-on allows installation of 'apks'.
Interesting, can you expand - possibly on the forum? I'd love to have monit
installed :)
@mdeweerd did you diagnose why HA is unreachable for you? The supervisor watchdog is contacting HA over https, and will force it to restart if it doesn't reply. Was something else, nginx for example, killed or starved by HA going crazy?
@emontnemery
In the mean time I am on location and I powercycled the system and it's up.
nginx
was still up but could visibly not contact home assistant itself. The last line in the log just indicates what other users reported:
2021-10-31 02:02:33 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded
I provided more information about my tests in this comment .
Maybe the supervisor was happy with the reply from nginx which did indicate an error, but returned a reply.
@borpin The following adds monit
and nmap
. I also tried sqlite3
but then the "Terminal + SSH" addon did not start correctly. So some packages will work, others not.
authorized_keys: []
apks:
- monit
- nmap
password: ''
server:
tcp_forwarding: false
@emontnemery Here is the statistics data from two HA systems for times close to the time change. datemissingBeforeTimeChange.zip
The hour missing in statistics is UTC 2021-10-30 23:00 to UTC 2021-10-30 23:59 or Local time (Paris) 2021-10-31 01:00 to UTC 2021-10-30 01:59 which is the 2nd hour before the time change at 3:00 local time.
As said, I would understand that the timechange at 3AM would create an issue with the data from UTC 2021-10-31 00:00 to UTC 2021-10-31 00:59 which is the hour preceding the time change, but I find it strange that it impacts the hour before that.
1 hour missing in statistics:
INSERT INTO statistics VALUES(44003,'2021-10-30 23:00:10.471482',56,'2021-10-30 22:00:00.000000',NULL,NULL,NULL,'2021-10-27 20:45:39.640551',4.0999999999996896704,32.448960000001861204);
INSERT INTO statistics VALUES(44004,'2021-10-31 01:00:12.683680',1,'2021-10-31 00:00:00.000000',55.999999999999999999,55.999999999999999999,55.999999999999999999,NULL,NULL,NULL);
1 hour missing in statistics_short_term:
INSERT INTO statistics_short_term VALUES(225453,'2021-10-30 23:55:10.351732','2021-10-30 23:50:00.000000',NULL,NULL,NULL,NULL,6786.2500000000000001,207.34799999999995634,54);
INSERT INTO statistics_short_term VALUES(225454,'2021-10-31 01:00:12.341185','2021-10-31 00:55:00.000000',3373.9999999999999999,3373.9999999999999999,3373.9999999999999999,NULL,NULL,NULL,21);
1 hour missing in statistics:
INSERT INTO statistics VALUES(46709,'2021-10-30 23:00:10.753588',78,'2021-10-30 22:00:00.000000',NULL,NULL,NULL,NULL,0.0,0.0);
INSERT INTO statistics VALUES(46710,'2021-10-31 01:00:32.360636',7,'2021-10-31 00:00:00.000000',81.999999999999999998,81.999999999999999998,81.999999999999999998,NULL,NULL,NULL);
One hour gap in statistics_short_term:
INSERT INTO statistics_short_term VALUES(277072,'2021-10-30 23:55:10.651738','2021-10-30 23:50:00.000000',NULL,NULL,NULL,NULL,437.896000000000015,20.564000000000021372,41);
INSERT INTO statistics_short_term VALUES(277073,'2021-10-31 01:00:19.469394','2021-10-31 00:55:00.000000',0.0,0.0,0.0,NULL,NULL,NULL,42);
Time zone information:
Epoch timestamp: 1635638400 Timestamp in milliseconds: 1635638400000 Date and time (GMT): Sunday 31 October 2021 00:00:00 Date and time (your time zone): dimanche 31 octobre 2021 02:00:00 GMT+02:00
Epoch timestamp: 1635641999 Timestamp in milliseconds: 1635641999000 Date and time (GMT): Sunday 31 October 2021 00:59:59 Date and time (your time zone): dimanche 31 octobre 2021 02:59:59 GMT+02:00
Epoch timestamp: 1635642000 Timestamp in milliseconds: 1635642000000 Date and time (GMT): Sunday 31 October 2021 01:00:00 Date and time (your time zone): dimanche 31 octobre 2021 02:00:00 GMT+01:00
The following adds
monit
andnmap
@mdeweerd - how do you configure monit
and access it then? Is this documented anywhere? Cheers for this :)
@borpin I created a topic on the forum - it's better to continue that discussion there.
The problem
In UK 2021/10/31, at 01:59:59, time got back to 01:00:00 (summer to winter, Daylight saving), since then (it's 01:08) home-assistant has a high CPU usage, using a core at 100%.
Edit: memory usage seems to increase quickly:
at 01:14:00
Edit2: Switching lights work fine but it does not appear on the state history of the light.
What version of Home Assistant Core has the issue?
core-2021.10.6
I could not find the exact image id on docker hub, but here is the label section of
docker inspect
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant Container
Integration causing the issue
No response
Link to integration documentation on our website
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
Interesting
The recorder queue reached the maximum size of 30000
at
2021-10-31T01:03:30.660640416Z
I restarted the container to see if it could fix the issue, it did not.