Closed mattclar closed 2 years ago
Is the above journal from a journalctl
command? It seems to contain multiple boots, but they seem incomplete.
Yes this is my underlying issue, it seems as though the log is being cleared somehow? Yes the output is from journalctl
On Mon, 4 Oct 2021, 9:24 am Stefan Agner, @.***> wrote:
Is the above journal from a journalctl command? It seems to contain multiple boots, but they seem incomplete.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/home-assistant/operating-system/issues/1563#issuecomment-933035341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGNYDY7PGUN7F624XEFDULUFDJZLANCNFSM5FHPLUKA .
Home Assistant OS uses systemd-journald
for logging, and its built-in log rotation/limit capability (see also https://www.freedesktop.org/software/systemd/man/journald.conf.html#SystemMaxUse=). However, with that you should have complete logs at for at least a couple of days.
Maybe the files get corrupted or similar? You can check the log files and print a list of log files using journalctl
:
journalctl --verify
journalctl --header
Not sure if this is normal?
Looks ok to me. I realized that journalctl -b -1
to get the last boot seems not to work reliably at times. Not sure if that is a journalctl
bug. But what seems to work for me is selecting a boot explicitly via:
journalctl --list-boots
Then select the hash of a boot and use
journalctl -b <hash>
It really seems that journalctl -b -x
syntax is broken in current OS release:
# journalctl --list-boots
-3 ae30103851c44560bb3f35b89a106825 Tue 2021-02-02 15:29:48 UTC—Wed 2021-10-06 12:46:05 UTC
-2 4448e116d4804dd387db653e2b94a1f2 Wed 2021-10-06 12:46:05 UTC—Wed 2021-10-06 13:06:43 UTC
-1 66f291a50d584c29a2847906d7a8905b Wed 2021-10-06 13:06:44 UTC—Wed 2021-10-06 13:14:40 UTC
0 249e1bd80a5b45de86aadbdc405f536c Wed 2021-10-06 13:14:40 UTC—Thu 2021-10-07 08:43:44 UTC
# journalctl -b -1 | wc -l
3802
# journalctl -b 66f291a50d584c29a2847906d7a8905b | wc -l
19318
Currently HAOS is using v247.3 but even the latest stable v247.9 shows the problem. On my desktop with v249 I can't reproduce so it seems something has been fixed upstream :man_shrugging:
I'd rather prefer to not spend the time to track down the bug and backport since you can get logs by using journalctl --list-boots
and then the hash.
Looks like your exactly correct! hopefully the fix flows downstream soon :)
also, thanks for looking into this :)
I don't think that https://github.com/systemd/systemd/pull/20496 fixes it. From what I can see this is just replacing numbers with preprocessor defines/consts, but the match character array will still be 42 bytes long.
Hardware:
Home Assistant OS release: Hassos 6.4
HA Info
System Health
Home Assistant Community Store
GitHub API | ok -- | -- Github API Calls Remaining | 5000 Installed Version | 1.15.2 Stage | running Available Repositories | 884 Installed Repositories | 12Home Assistant Cloud
logged_in | true -- | -- subscription_expiration | 1 November 2021, 11:00 relayer_connected | true remote_enabled | false remote_connected | false alexa_enabled | false google_enabled | true can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | okHome Assistant Supervisor
host_os | Home Assistant OS 6.4 -- | -- update_channel | stable supervisor_version | supervisor-2021.09.6 docker_version | 20.10.7 disk_total | 109.3 GB disk_used | 71.8 GB healthy | true supported | true board | rpi4-64 supervisor_api | ok version_api | ok installed_addons | Samba share (9.5.1), File editor (5.3.3), Home Assistant Google Drive Backup (0.105.2), Mosquitto broker (6.0.1), Node-RED (10.0.1), SSH & Web Terminal (9.0.1), ESPHome (2021.9.2), Duck DNS (1.14.0), motionEye (0.15.1), MariaDB (2.4.0), Z-Wave JS (0.1.44), Dnsmasq (1.4.4), Glances (0.13.0), Zigbee2mqtt (1.21.2-1)Lovelace
dashboards | 2 -- | -- resources | 5 views | 5 mode | storageSupervisor logs: NA
Journal logs: See below
Kernel logs: NA
Description of problem: I have an intermittent hard freeze happening once every 24-48 hours for the last 2 weeks. I typically see initially that all the entities associated with addons (Z-Wave/Zigbee) go unavailable and then within a few minutes the whole system freezes and the only option I have is to reboot. However when I reboot the logs are gone. Even if I use journalctl -b -1 the logs only have entries from network manager. I'm not sure if this is to do with whatever my error is or maybe it's a rpi thing as the internal clock isn't maintained on reboot there seem to be a few messages about setting the clock and that might be causing some logging to be lost? It's kind of frustrating to not be able to find relevant logs after a hard freeze like this makes instability hard to diagnose!