Closed HertogArjan closed 2 months ago
Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration (recorder
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
recorder documentation recorder source (message by IssueLinks)
It sounds like you have an integration that is filling the database with state changes so quickly that the system cannot keep up.
Try enabling debug logging for homeassistant.core
and check which states are being changed very frequently
Hi, I enabled debug logging for homeassistant.core
and while there are certainly a lot of state changes being reported, they do not increase notably after starting a purge when the CPU and I/O starts spiking.
I did just now install Home Assistant Container on a separate machine using the same configuration and noticed I could not reproduce the issue by running a manual purge. This could mean the issue is somehow caused by the installation environment.
After doing a manual purge on the Container version the database size has been reduced to 620 MB, nice.
When doing a purge on my original system with debug logging enabled for homeassistant.components.recorder
all messages from the recorder stop abruptly after:
2024-05-11 18:29:27.435 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 state ids and 1059 attributes_ids to remove
2024-05-11 18:29:27.485 DEBUG (Recorder) [homeassistant.components.recorder.purge] Updated <sqlalchemy.engine.cursor.CursorResult object at 0x7f41add689f0> states to remove old_state_id
2024-05-11 18:29:27.521 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4052270> states
2024-05-11 18:29:27.538 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 state ids and 1021 attributes_ids to remove
2024-05-11 18:29:27.601 DEBUG (Recorder) [homeassistant.components.recorder.purge] Updated <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4052270> states to remove old_state_id
2024-05-11 18:29:27.628 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41add689f0> states
2024-05-11 18:29:27.640 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 state ids and 1027 attributes_ids to remove
2024-05-11 18:29:27.689 DEBUG (Recorder) [homeassistant.components.recorder.purge] Updated <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4775550> states to remove old_state_id
2024-05-11 18:29:27.716 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4052270> states
2024-05-11 18:29:28.065 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 25155 shared attributes to remove
2024-05-11 18:29:28.208 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b1745a20> attribute states
2024-05-11 18:29:28.301 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4f81fd0> attribute states
2024-05-11 18:29:28.359 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b2c79da0> attribute states
2024-05-11 18:29:28.414 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b1747e70> attribute states
2024-05-11 18:29:28.467 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b2c79da0> attribute states
2024-05-11 18:29:28.520 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b4f81fd0> attribute states
2024-05-11 18:29:28.537 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7f41b2c79da0> attribute states
2024-05-11 18:29:28.538 DEBUG (Recorder) [homeassistant.components.recorder.purge] After purging states and attributes_ids remaining=True
2024-05-11 18:29:28.736 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 event ids and 817 data_ids to remove
That sounds like a corrupt index
You might try running an integrity check https://www.sqlite.org/pragma.html#pragma_integrity_check on the database and reindex anything that comes up: https://www.sqlite.org/lang_reindex.html
It could also indicate a problem with your storage medium
It is pretty much certainly an issue with my installation of HA Core and not with the database or storage medium. I installed HA Container on the same machine and now the issue is gone.
Any suggestions why the issue only occurs with HA Core?
Do you have a different version of SQLite on the core install?
No because SQLite (the CLI tool) is not part of the Core dependencies. The Recorder page mentions Home Assistant uses SQLAlchemy for database access, but that module is not installed manually when installing Core. I expect HA installs this itself and also keeps it up to date. I don't know how I would compare the version numbers for this module.
You can find the version under setting, system, repairs, three dot menu, system information
Thanks @bdraco. According to the system information my Core install does indeed use a newer SQLite version than the Container install (3.45.3 > 3.44.2). It also uses a slightly newer Python version (3.12.3 > 3.12.2) for a slightly lower HA Core version (core-2024.5.2 < core-2024.5.3).
I can try and see what happens if I equalize the Core install to the Container install. Do you know how I would downgrade the SQLite version?
If you're interested I attach the system information outputs for the Core and Container installs: sys_info_core.txt sys_info_container.txt
This is also happening for me now, a Core install.
I recently upgraded Ubuntu Server to 24.04. Also on Python 3.12.3. The SQLite3 library version installed is 3.45.1-1ubuntu2.
It gets locked up every morning at the programmed time (4:12?) and if I manually call the recorder.purge
service. One "CPU" gets pegged to 100% by hass and stopping it drops the relevant lines:
2024-06-16 01:09:57.503 WARNING (MainThread) [homeassistant.core] Timed out waiting for final writes to complete, the shutdown will continue
2024-06-16 01:09:57.503 WARNING (MainThread) [homeassistant.core] Shutdown stage 'final write': still running: <Task pending name='Task-126987' coro=<Recorder._async_shutdown() running at /srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/core.py:476> wait_for=<Future pending cb=[_chain_future.<locals>._call_check_cancel() at /usr/lib/python3.12/asyncio/futures.py:387, <1 more>, Task.task_wakeup()]> cb=[set.remove()]>
I installed the command-line sqlite3
tool and run VACUUM
and PRAGMA integrity_check;
with no issues on the database file (all said "ok").
I'm assuming this line: https://github.com/home-assistant/core/blob/c0a680a80a2d0f43d9a3151bdc75cd3b48c0b332/homeassistant/components/recorder/purge.py#L577 is the one causing the hang, as we never see the following log entry at https://github.com/home-assistant/core/blob/c0a680a80a2d0f43d9a3151bdc75cd3b48c0b332/homeassistant/components/recorder/purge.py#L578 .
I also got logs that look the same once debugging / logging level debug are enabled:
...
2024-06-16 01:23:16.430 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 state ids and 326 attributes_ids to remove
2024-06-16 01:23:16.605 DEBUG (Recorder) [homeassistant.components.recorder.purge] Updated <sqlalchemy.engine.cursor.CursorResult object at 0x7dc9b4dd2740> states to remove old_state_id
2024-06-16 01:23:16.723 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7dc9b4dd2e40> states
2024-06-16 01:23:16.783 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 state ids and 380 attributes_ids to remove
2024-06-16 01:23:16.958 DEBUG (Recorder) [homeassistant.components.recorder.purge] Updated <sqlalchemy.engine.cursor.CursorResult object at 0x7dc9b4dd8670> states to remove old_state_id
2024-06-16 01:23:17.070 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7dc9b4dd2740> states
2024-06-16 01:23:17.127 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 state ids and 346 attributes_ids to remove
2024-06-16 01:23:17.297 DEBUG (Recorder) [homeassistant.components.recorder.purge] Updated <sqlalchemy.engine.cursor.CursorResult object at 0x7dc9b4dd8670> states to remove old_state_id
2024-06-16 01:23:17.412 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7dc9b4dd29e0> states
2024-06-16 01:23:17.444 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 1915 shared attributes to remove
2024-06-16 01:23:17.482 DEBUG (Recorder) [homeassistant.components.recorder.purge] Deleted <sqlalchemy.engine.cursor.CursorResult object at 0x7dc9b4ddacf0> attribute states
2024-06-16 01:23:17.484 DEBUG (Recorder) [homeassistant.components.recorder.purge] After purging states and attributes_ids remaining=True
2024-06-16 01:23:17.550 DEBUG (Recorder) [homeassistant.components.recorder.purge] Selected 4000 event ids and 32 data_ids to remove
Partial thread dump via pyrasite
(see https://stackoverflow.com/questions/6849138/check-what-a-running-process-is-doing-print-stack-trace-of-an-uninstrumented-py):
16 01:59:04 piernas hass[116758]: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
jun 16 01:59:04 piernas hass[116758]: self._bootstrap_inner()
jun 16 01:59:04 piernas hass[116758]: File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
jun 16 01:59:04 piernas hass[116758]: self.run()
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/pyudev/monitor.py", line 533, in run
jun 16 01:59:04 piernas hass[116758]: for file_descriptor, event in eintr_retry_call(notifier.poll):
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/pyudev/_util.py", line 152, in eintr_retry_call
jun 16 01:59:04 piernas hass[116758]: return func(*args, **kwargs)
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/pyudev/_os/poll.py", line 91, in poll
jun 16 01:59:04 piernas hass[116758]: return list(self._parse_events(eintr_retry_call(self._notifier.poll, timeout)))
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/pyudev/_util.py", line 152, in eintr_retry_call
jun 16 01:59:04 piernas hass[116758]: return func(*args, **kwargs)
jun 16 01:59:04 piernas hass[116758]: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
jun 16 01:59:04 piernas hass[116758]: self._bootstrap_inner()
jun 16 01:59:04 piernas hass[116758]: File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
jun 16 01:59:04 piernas hass[116758]: self.run()
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/core.py", line 703, in run
jun 16 01:59:04 piernas hass[116758]: self._run()
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/core.py", line 787, in _run
jun 16 01:59:04 piernas hass[116758]: self._run_event_loop()
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/core.py", line 872, in _run_event_loop
jun 16 01:59:04 piernas hass[116758]: self._guarded_process_one_task_or_event_or_recover(queue_.get())
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/core.py", line 906, in _guarded_process_one_task_or_event_or_recover
jun 16 01:59:04 piernas hass[116758]: self._process_one_task_or_event_or_recover(task)
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/core.py", line 926, in _process_one_task_or_event_or_recover
jun 16 01:59:04 piernas hass[116758]: task.run(self)
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/tasks.py", line 114, in run
jun 16 01:59:04 piernas hass[116758]: if purge.purge_old_data(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/util.py", line 643, in wrapper
jun 16 01:59:04 piernas hass[116758]: return job(instance, *args, **kwargs)
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/purge.py", line 92, in purge_old_data
jun 16 01:59:04 piernas hass[116758]: has_more_to_purge |= _purge_events_and_data_ids(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/purge.py", line 237, in _purge_events_and_data_ids
jun 16 01:59:04 piernas hass[116758]: _purge_event_ids(session, event_ids)
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/homeassistant/components/recorder/purge.py", line 577, in _purge_event_ids
jun 16 01:59:04 piernas hass[116758]: deleted_rows = session.execute(delete_event_rows(event_ids))
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2351, in execute
jun 16 01:59:04 piernas hass[116758]: return self._execute_internal(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2236, in _execute_internal
jun 16 01:59:04 piernas hass[116758]: result: Result[Any] = compile_state_cls.orm_execute_statement(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/sqlalchemy/orm/bulk_persistence.py", line 1953, in orm_execute_statement
jun 16 01:59:04 piernas hass[116758]: return super().orm_execute_statement(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/sqlalchemy/orm/context.py", line 293, in orm_execute_statement
jun 16 01:59:04 piernas hass[116758]: result = conn.execute(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1418, in execute
jun 16 01:59:04 piernas hass[116758]: return meth(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/sqlalchemy/sql/lambdas.py", line 603, in _execute_on_connection
jun 16 01:59:04 piernas hass[116758]: return connection._execute_clauseelement(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement
jun 16 01:59:04 piernas hass[116758]: ret = self._execute_context(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context
jun 16 01:59:04 piernas hass[116758]: return self._exec_single_context(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context
jun 16 01:59:04 piernas hass[116758]: self.dialect.do_execute(
jun 16 01:59:04 piernas hass[116758]: File "/srv/homeassistant/lib/python3.12/site-packages/sqlalchemy/engine/default.py", line 924, in do_execute
jun 16 01:59:04 piernas hass[116758]: cursor.execute(statement, parameters)
jun 16 01:59:04 piernas hass[116758]: File "<string>", line 1, in <module>
jun 16 01:59:04 piernas hass[116758]: File "<string>", line 5, in <module>
jun 16 01:59:04 piernas hass[116758]: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
jun 16 01:59:04 piernas hass[116758]: self._bootstrap_inner()
jun 16 01:59:04 piernas hass[116758]: File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
jun 16 01:59:04 piernas hass[116758]: self.run()
jun 16 01:59:04 piernas hass[116758]: File "/usr/lib/python3.12/threading.py", line 1010, in run
jun 16 01:59:04 piernas hass[116758]: self._target(*self._args, **self._kwargs)
jun 16 01:59:04 piernas hass[116758]: File "/usr/lib/python3.12/concurrent/futures/thread.py", line 89, in _worker
jun 16 01:59:04 piernas hass[116758]: work_item = work_queue.get(block=True)
jun 16 01:59:04 piernas hass[116758]: File "/usr/lib/python3.12/threading.py", line 1030, in _bootstrap
jun 16 01:59:04 piernas hass[116758]: self._bootstrap_inner()
jun 16 01:59:04 piernas hass[116758]: File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
Not sure what the two <string>
pieces of the stack mean... Dynamically generated Python code being executed?
I stoped HA and opened the database with the commandline sqlite3
tool.
So, I think I had some entry in the events
table with an empty-string data_id
(timestamp obtained via date +%s --date="7 days ago"
):
sqlite> select count(*), data_id from events where time_fired_ts < 1717965456 group by data_id;
2118|
3|22046
3|22047
3|22048
...
I just deleted the relevant events:
sqlite> delete from events where time_fired_ts < 1717965456;
And verified that no unreferenced event_data
entries remained:
sqlite> select distinct(data_id) from event_data where data_id not in (select distinct(data_id) from events);
Calling recorder.purge
after that completed successfully.
Unfortunately my instance got back to getting stuck at purge the second day after that. At this point my next steps are probably going to add debug logging to the diverse points of the above stack trace the next time this reproduces (I cleaned the database manually again this time).
We found the problem
Some old databases have FOREIGN KEY(event_id) REFERENCES events (event_id) ON DELETE CASCADE,
in the states table. The index used for the foreign key is now empty on some systems, which may result in a full table scan with the version of SQLite included in Alpine 3.20.x when deleting rows from the states
table. The problem became more widespread when the docker images were updated from Alpine 3.19 to 3.20. Other operating systems may have updated SQLite sooner, so this problem also appeared sooner on those systems. Downgrading SQLite or disabling foreign keys will also likely work around the issue but this is not recommended.
Unfortunately SQLite does not support dropping a foreign key constraint (see Why ALTER TABLE is such a problem for SQLite
at https://www.sqlite.org/lang_altertable.html) which makes the solution much more complex.
To fix this the whole states table needs to be recreated using the steps here: https://www.sqlite.org/lang_altertable.html
- Making Other Kinds Of Table Schema Changes
so we need write a rebuild_table function for sqlite to fix this.
Sadly writing that code is very risky if we get it wrong. Frankly its scary
Going to go with the 12 step rebuild process.. Its slower but less risky.
Will be fixed in 2024.8.0
cool, Just noticed the same behave on my system for the last two days during the tests of 2024.7.0bx Not sure, what exactly triggered this issue to happen now - while testing the beta - until then, the recorder was working without issues.
I'm not sure this is the actual problem in my case.
sqlite> .schema event_data
CREATE TABLE event_data (
data_id INTEGER NOT NULL,
hash BIGINT,
shared_data TEXT,
PRIMARY KEY (data_id)
);
CREATE INDEX ix_event_data_hash ON event_data (hash);
sqlite> .schema events
CREATE TABLE events (
event_id INTEGER NOT NULL,
event_type VARCHAR(32),
event_data TEXT,
origin VARCHAR(32),
time_fired DATETIME,
created DATETIME,
context_id VARCHAR(36),
context_user_id VARCHAR(36),
context_parent_id VARCHAR(36), data_id INTEGER, origin_idx INTEGER, time_fired_ts FLOAT, context_id_bin BLOB, context_user_id_bin BLOB, context_parent_id_bin BLOB, event_type_id INTEGER,
PRIMARY KEY (event_id)
);
CREATE INDEX ix_events_data_id ON events (data_id);CREATE INDEX ix_events_time_fired_ts ON events (time_fired_ts);
CREATE INDEX ix_events_context_id_bin ON events (context_id_bin);
CREATE INDEX ix_events_event_type_id_time_fired_ts ON events (event_type_id, time_fired_ts);
sqlite>
I can't see any foreign key there...
Edit: removed spurious extra copied text.
I'm not sure this is the actual problem in my case.
sqlite> .schema event_data CREATE TABLE event_data ( data_id INTEGER NOT NULL, hash BIGINT, shared_data TEXT, PRIMARY KEY (data_id) ); CREATE INDEX ix_event_data_hash ON event_data (hash); sqlite> .schema events CREATE TABLE events ( event_id INTEGER NOT NULL, event_type VARCHAR(32), event_data TEXT, origin VARCHAR(32), time_fired DATETIME, created DATETIME, context_id VARCHAR(36), context_user_id VARCHAR(36), context_parent_id VARCHAR(36), data_id INTEGER, origin_idx INTEGER, time_fired_ts FLOAT, context_id_bin BLOB, context_user_id_bin BLOB, context_parent_id_bin BLOB, event_type_id INTEGER, PRIMARY KEY (event_id) ); CREATE INDEX ix_events_data_id ON events (data_id);CREATE INDEX ix_events_time_fired_ts ON events (time_fired_ts); CREATE INDEX ix_events_context_id_bin ON events (context_id_bin); CREATE INDEX ix_events_event_type_id_time_fired_ts ON events (event_type_id, time_fired_ts); sqlite>
I can't see any foreign key there...
Edit: removed spurious extra copied text.
Please check again as I looks like that is not the right database file unless you are running a 2-3 year old version of HA
This instance was installed originally more than two years ago and has been migrated to the latest version each time via pip3
.
home-assistant_v2.db
is the database file I've been opening to operate and mitigate this issue with the sqlite3
command line utility above (and my SQL statements on it "did things"):
homeassistant@piernas ~/.homeassistant (master)> ls -la *.db
-rw-r--r-- 1 homeassistant homeassistant 833134592 jun 30 15:22 home-assistant_v2.db
-rw-r--r-- 1 homeassistant homeassistant 1384448 jun 30 15:18 zigbee.db
homeassistant@piernas ~/.homeassistant (master)>
Isn't this the right database file? What other name should I look for?
Sure looks like its been changed recently. That's very strange that is missing all the foreign keys.
I'm not sure whats going on with your system.
events should look like this
sqlite> .schema events
CREATE TABLE events (
event_id INTEGER NOT NULL,
event_type VARCHAR(64),
event_data TEXT,
origin VARCHAR(32),
time_fired DATETIME,
context_id VARCHAR(36),
context_user_id VARCHAR(36),
context_parent_id VARCHAR(36),
data_id INTEGER, origin_idx INTEGER, time_fired_ts FLOAT, context_id_bin BLOB, context_parent_id_bin BLOB, context_user_id_bin BLOB, event_type_id INTEGER,
PRIMARY KEY (event_id),
FOREIGN KEY(data_id) REFERENCES event_data (data_id)
);
CREATE INDEX ix_events_data_id ON events (data_id);
CREATE INDEX ix_events_time_fired_ts ON events (time_fired_ts);
CREATE INDEX ix_events_context_id_bin ON events (context_id_bin);
CREATE INDEX ix_events_event_type_id_time_fired_ts ON events (event_type_id, time_fired_ts);
What does your full schema look like using .schema
?
Find schema dump attached.
The problem foreign key does exist in the states table in your schema so it is the same issue
FOREIGN KEY(event_id) REFERENCES events (event_id) ON DELETE CASCADE
There's an index on the event_id
field of the states
table, though,
CREATE INDEX ix_states_event_id ON states (event_id);
and it's the primary key in events
.
Is that still a problem? I understood the problem was related to the lack of an index...
I wouldn't be surprised if this was related to the updated Python interpreter and sqlite3 library brought be the upgrade to Ubuntu Server 24.04 having issues with the existing data for some reason.
The index is likely empty because there is no longer any data in it and it seems like the newer SQLite has an issue with that which is why you get the full table scan. The index will get dropped by the migration as soon as the foreign key is removed from the rebuild in the linked PR
Not sure what you mean with the index not having any data. Isn't the point that the RDBMS (sqlite3 here) should keep the index up to date with the changes in the table it indexes? If the index doesn't have data it'd be because the table doesn't have data and then the full table scan doesn't matter?
Sorry if I'm missing some obvious piece of the puzzle here.
Is there a bug or link to the generic sqlite3 issue you're referring to?
Edit: typo
as a workaround, I am restarting HA before the purge will happen (around 04:00 am) I noticed, that the data stopped recording always after 04:15 am - so this is the time, when I reboot the Host.
Until now, it seems to work, with the side effect, that the DB will not be purged,,,
Same problems here, just wait until 2024.8.0 or can we fix DB ourselves?
Same problem
I have the same problem since 2024/7 install
Update my docker to 2024.8.0.dev202407040219 So far so good, lots of database rebuilds and activity.
Same problem after 2024.7 install on HAOS/x86, what can I do to fix this?
@makerwolf1 can you please check if one of the following does match?
Custom Integrations mentioned here: https://community.home-assistant.io/t/psa-2024-7-recorder-problems/746428/8
if the recorder stopps after running the service recorder.purge
Same issue here after upgrade to 2024.7, been rebooting every morning as noticed a spike in CPU usage and logging stopped.
I dont see any reference to custom intergrations i'm using in the link above and yes the recorder stops after manually running the purge.
EDIT: is there a way to determine if its a custom component causing this or a database issue?
you can temporarily disable auto_purge in your configuration.yaml.
recorder:
auto_purge: false
keep in mind to remove this, once the issue has been fixed, else, your db will grow until you are running into other issues due to the DB file size.
2204.8 dev releases are available already, don't know which system you are but for me it helped.
2204.8 dev releases are available already, don't know which system you are but for me it helped.
HA OS running on a vitural machine. I did try joining the beta channel but no updates were shown.
I'm happy to just reboot every morning until a fix rolls out, reluctant to manually disable the recorder as I don't want to forget and run into bigger issues down the line
the problem is, that even with restarting / rebooting HA, purge will probably not remove old data from the Database. The issue here is, that due some circumstances, the purge will cause a full tablescan over the database - and this is causing a deadlock.
You don't disable the recorder with this configuration, you just disable the automatic purge job, which will be triggered each night (and therefore, causing the deadlock)
2204.8 dev releases are available already, don't know which system you are but for me it helped.
yeah, but dev is nothing we should recommend to everyone... :) The official beta for 2024.8 will start at Wednesday, July 31st.
Had the same issue and cherry-picked the upcoming (2024.8) migration task to my prod env. I can also confirm that the migration works ok.
- Custom Integrations mentioned here: https://community.home-assistant.io/t/psa-2024-7-recorder-problems/746428/8
none of those integrations are installed on my system.
- if the recorder stopps after running the service recorder.purge
to be honest i'd prefer not to give this a try, since I'm on a production system and this would involve an extremely large risk to miss datapoints. For now i'm going to set auto_purge: false
. If it's really helpful I can try setting up a duplicate test system from a backup and see what that does, but in all honesty it seems like this is an already fixed issue, no?
Any chance we can get this change in a 2024.7.x update? This really is quite a serious issue. It silently broke a large portion of my system.
Same issue here since 2024.7 Every night at exactly 04:11 data stops being recorded. A Reboot will then fix the issue until 04:11.
It seems that since 2024.7 a lot more users are affected by this purge issue. I don't know what triggered that, but if https://github.com/home-assistant/core/pull/120779 is only scheduled for 2024.8 something might need to be rolled back to pass the time
It seems that since 2024.7 a lot more users are affected by this purge issue. I don't know what triggered that, but if #120779 is only scheduled for 2024.8 something might need to be rolled back to pass the time
I absolutely agree. Setting auto_purge: false
fixed it for me. This is a big issue that should be addressed ASAP!
Same issue running on a Pi 4 with default recorder setup. I will try turning-off auto purge
Having to wait until .8 will have an impact on lots of users. I don't have any of the "suspicious" add ons, but my data collection also stopped at 04:05 AM. I put the purge: false into my config, but this is untenable...
is there a way of determining which custom component are causing this?
This problem is solved in 2024.7.2. If the system ran out of disk space, and the table rebuild failed, it will try again in 2024.8.1+ see issue https://github.com/home-assistant/core/issues/123348 and solution https://github.com/home-assistant/core/pull/123388
Workaround: Disabling nightly auto purge will prevent the issue from occurring (this is not a long term solution)
Be sure to re-enable auto-purge after installing 2024.7.2 or your database will grow without bounds, and your system will eventually run out of disk space or become sluggish.
Cause: https://github.com/home-assistant/core/issues/117263#issuecomment-2197311144 Solution: https://github.com/home-assistant/core/pull/120779
The problem
Every night at around ~4:10 the histories for all entities stop. This has been happening since at least April 9th. I updated Home Assistant to 2024.4.1 on April 5th, but I can't say for sure if this issue started directly afterwards. A restart of Home Assistant allows recording again but does not restore the history missed since ~4:10. I suspect it has something to do with the Recorder auto purge at 4:12 because the same symptoms happen when the purge is run manually.
I don't think the manual or automatic purge is currently able to finish because the (SQLite) database seems way too large (>6GB) for my configured
purge_keep_days
of 7.If I run
recorder.purge
from the web UI the same symptoms happen like during the night. By looking at the mtime it is clearhome-assistant_v2.db
does not get written to anymore.htop
shows HA using 100% of one CPU core continously andiotop
show HA reading from disk at ~400MB/s continously. This went on for at least 25 minutes before I stopped the process.The logs show nothing unusual happening around 4:12. When I run
recorder.purge
from the web UI with verbose logging enabled the logs just show:When HA is stopped using SIGTERM the shutdown takes a long time and it is clear from the logs it is waiting for a Recorder task:
See the rest of the relevant messages during shutdown below.
What version of Home Assistant Core has the issue?
core-2024.5.2
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant Core
Integration causing the issue
Recorder
Link to integration documentation on our website
https://www.home-assistant.io/integrations/recorder/#service-purge
Diagnostics information
No response
Example YAML snippet
Anything in the logs that might be useful for us?
Additional information
I thought maybe my database could be corrupted, so with HA shutdown I ran
mv home-assistant_v2.db home-assistant_v2_old.db; sqlite3 home-assistant_v2_old.db ".recover" | sqlite3 home-assistant_v2.db
and then tried to run a purge again. Unfortunately the problem was not resolved. My database did shrink by about 1.5 GB.