Closed julianpas closed 1 year ago
To add to that the problem seems to have been made more visible by the fact that the old_state_id FK was preventing purging of the DB so I was actually left with way more than the 90 days I wanted. Either way I think the optimization above is a valid one.
Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration (recorder
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
recorder documentation recorder source (message by IssueLinks)
This is probably the change you want, but most of the queries are the other direction so I'm not sure we should do it
diff --git a/homeassistant/components/recorder/db_schema.py b/homeassistant/components/recorder/db_schema.py
index cf4c0543c1..021e2d7504 100644
--- a/homeassistant/components/recorder/db_schema.py
+++ b/homeassistant/components/recorder/db_schema.py
@@ -370,18 +370,6 @@ class EventTypes(Base):
class States(Base):
"""State change history."""
- __table_args__ = (
- # Used for fetching the state of entities at a specific time
- # (get_states in history.py)
- Index(METADATA_ID_LAST_UPDATED_INDEX_TS, "metadata_id", "last_updated_ts"),
- Index(
- STATES_CONTEXT_ID_BIN_INDEX,
- "context_id_bin",
- mysql_length=CONTEXT_ID_BIN_MAX_LENGTH,
- mariadb_length=CONTEXT_ID_BIN_MAX_LENGTH,
- ),
- _DEFAULT_TABLE_ARGS,
- )
__tablename__ = TABLE_STATES
state_id: Mapped[int] = mapped_column(Integer, Identity(), primary_key=True)
entity_id: Mapped[str | None] = mapped_column(
@@ -436,6 +424,19 @@ class States(Base):
)
states_meta_rel: Mapped[StatesMeta | None] = relationship("StatesMeta")
+ __table_args__ = (
+ # Used for fetching the state of entities at a specific time
+ # (get_states in history.py)
+ Index(METADATA_ID_LAST_UPDATED_INDEX_TS, "metadata_id", last_updated_ts.desc()),
+ Index(
+ STATES_CONTEXT_ID_BIN_INDEX,
+ "context_id_bin",
+ mysql_length=CONTEXT_ID_BIN_MAX_LENGTH,
+ mariadb_length=CONTEXT_ID_BIN_MAX_LENGTH,
+ ),
+ _DEFAULT_TABLE_ARGS,
+ )
+
def __repr__(self) -> str:
"""Return string representation of instance for debugging."""
return (
Still testing this
It makes the max queries faster and the ranged query slower because it has to resort
reverse sort order
QUERY PLAN
|--SEARCH states USING INDEX ix_states_metadata_id_last_updated_ts (metadata_id=? AND last_updated_ts>? AND last_updated_ts<?)
|--LIST SUBQUERY 2
| |--CO-ROUTINE SUBQUERY 1
| | `--SCAN CONSTANT ROW
| `--SCAN SUBQUERY 1
`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
sqlite> explain query plan SELECT states.metadata_id, states.state, NULL AS last_changed_ts, states.last_updated_ts
FROM states
WHERE (states.metadata_id IN (SELECT 1 FROM (SELECT 1) WHERE 1!=1) OR states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.metadata_id IN (1787, 1789, 1790, 1788) AND states.last_updated_ts > 1679617621.539766 AND states.last_updated_ts < 1679617668.940382 ORDER BY states.metadata_id, states.last_updated_ts;
QUERY PLAN
|--SEARCH states USING INDEX ix_states_metadata_id_last_updated_ts (metadata_id=? AND last_updated_ts>? AND last_updated_ts<?)
|--LIST SUBQUERY 2
| |--CO-ROUTINE SUBQUERY 1
| | `--SCAN CONSTANT ROW
| `--SCAN SUBQUERY 1
`--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
existing order
sqlite> drop index ix_states_metadata_id_last_updated_ts;
sqlite> create index ix_states_metadata_id_last_updated_ts on states (metadata_id, last_updated_ts);
sqlite> explain query plan SELECT states.metadata_id, states.state, NULL AS last_changed_ts, states.last_updated_ts
FROM states
WHERE (states.metadata_id IN (SELECT 1 FROM (SELECT 1) WHERE 1!=1) OR states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.metadata_id IN (1787, 1789, 1790, 1788) AND states.last_updated_ts > 1679617621.539766 AND states.last_updated_ts < 1679617668.940382 ORDER BY states.metadata_id, states.last_updated_ts;
QUERY PLAN
|--SEARCH states USING INDEX ix_states_metadata_id_last_updated_ts (metadata_id=? AND last_updated_ts>? AND last_updated_ts<?)
`--LIST SUBQUERY 2
|--CO-ROUTINE SUBQUERY 1
| `--SCAN CONSTANT ROW
`--SCAN SUBQUERY 1
sqlite> explain query plan SELECT states.metadata_id, states.state, NULL AS last_changed_ts, states.last_updated_ts
FROM states
WHERE (states.metadata_id IN (SELECT 1 FROM (SELECT 1) WHERE 1!=1) OR states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.metadata_id IN (1787, 1789, 1790, 1788) AND states.last_updated_ts > 1677664800.0 AND states.last_updated_ts < 1679617668.940382 ORDER BY states.metadata_id, states.last_updated_ts;
QUERY PLAN
|--SEARCH states USING INDEX ix_states_metadata_id_last_updated_ts (metadata_id=? AND last_updated_ts>? AND last_updated_ts<?)
`--LIST SUBQUERY 2
|--CO-ROUTINE SUBQUERY 1
| `--SCAN CONSTANT ROW
`--SCAN SUBQUERY 1
sqlite>
With a long history the distance to the last updated for a specific entity could be quite long without a descending index
But if we switch it its going to make all the forward queries slower
We really can't have it either way. Both need to be fast in both directions so I think we are stuck with another index
@julianpas AFAICT this query is only used by the filter
integration
homeassistant/components/filter/sensor.py: history.get_last_state_changes,
I'm assuming you are using that?
Hey there @dgomes, mind taking a look at this issue as it has been labeled with an integration (filter
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
filter documentation filter source (message by IssueLinks)
https://github.com/home-assistant/core/pull/90211 should improve the query for the single state case.
Currently the only integration using this query is filter
. I'm inclined to deprecate calling this api with a request for more than one in a future PR (which happens to be the default for the filter
integration) since we cannot deliver it performantly without https://github.com/home-assistant/core/pull/90209 which would force everyone to have to carry another large index which isn't something we should do for a single integration.
This would be some bad news for some filter
configurations since they could no longer be restored at startup unless there was some refactoring to the filter
integration:
Ideally filter
would switch to using something like https://github.com/home-assistant/core/blob/f1ec77b8e07eb9c8fd140a32327f6085040ad355/homeassistant/components/utility_meter/sensor.py#L616 and storing the previous X states via extra_restore_state_data
so when it restarted it wouldn't have to ask the database unless the window_size
is 1 (or limit the restore query size to 1).
If one of the filter code owners wants help with the above, I'm happy to participate.
Since I don't use the filter integration myself I'm not a good candidate to make the actual change or handle any followup as a result of the change as I would not be able to adequately qa a change from a user perspective
@julianpas AFAICT this query is only used by the
filter
integration
homeassistant/components/filter/sensor.py: history.get_last_state_changes,
I'm assuming you are using that?
I think in my case it was the sensor card configured to show 48h of recent graph that caused this. Eventually it was the perfect storm of situations that made me aware of the issue:
Now I used the advice from another bug to change the old_state_id foreign key contraint to set to NULL instead of prevent the delete op. My DB has been purged and compacted.
I tend to agree that maybe instead of changing the index if it is used elsewhere as is or create both which is expensive (my indeces already take more space than the data on that table) it is worth fixing the places where the DESC query is used and to replace with post retrieval order reversal if possible.
The fact that I also suffered from the issue where the foreign key contraint on old_state_id <-> state_id prevented the proper purging of the DB or else those old states should have been gone since it is more than 90 days that I disabled the history logging for those .
Is there an issue open on this or was it bad manual schema ?
The fact that I also suffered from the issue where the foreign key contraint on old_state_id <-> state_id prevented the proper purging of the DB or else those old states should have been gone since it is more than 90 days that I disabled the history logging for those .
Is there an issue open on this or was it bad manual schema ?
Yes, I found that others had problems with this too when I was searching for the foreign index name and stumbled upon this - https://github.com/home-assistant/core/issues/42402#issuecomment-729327643
I haven't messed up with the db schema until yesterday when I finally decided to tackle the problem that when I visited the panel with the aforementioned slow graphs my HA would "hang" showing history for any other graph for up to a minute because all it's recorder threads were busy waiting for the response on the slow queries. The db has been created about 1 year ago.
Can you post the schema and the log messages you were seeing when trying to purge?
There is logic to disconnect the rows from the old state id before purging added from the the original issue you referenced. That's from two years ago though so I'm surprised to hear it's still an issue today.
Also if you a backup from before you altered the schema, it would be great to get a copy as I can load it in my lab and try to purge. I can than adjust the migration and purge code to handle that case.
My Dropbox is nick@koston.org and drive is bdraco@gmail.com
I am sorry I don't have a backup prior to altering the schema or doing a successful purge. I am attaching the current schema (renamed from sql to txt to make github happy) of the DM to this message and I put two comments on the two lines that were altered by me from the original schema.
Given the sensitivity of the data in the db and it's sheer volume 15GB data+index I would be hesitant to share it but if you wish me to run certain queries to check the integrity of the still present data I would be more than happy to do that.
I checked the logs but there isn't any mentioning there of purging attempts since 3.15. which is the oldest date in the logs. Around the time when I was makng changes to the schema I have a lot of those errors though. And I saw some of those "too many locks" errors cropping up for me too when I was trying to clear myself some old records from the states table.
2023-03-23 14:22:38.301 ERROR (Recorder) [homeassistant.components.recorder.core] Error in database connectivity during commit: Error executing query: (MySQLdb.OperationalError) (1206, 'The total number of locks exceeds the lock table size')
[SQL: INSERT INTO states (entity_id, state, attributes, event_id, last_changed, last_changed_ts, last_updated, last_updated_ts, old_state_id, attributes_id, context_id, context_user_id, context_parent_id, origin_idx) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)]
[parameters: ('sensor.pws_wind_gust', '0.0', None, None, None, None, None, 1679577752.305077, 134360597, 417632, 'redacted', None, None, 0)]
(Background on this error at: https://sqlalche.me/e/20/e3q8). (retrying in 3 seconds)
2023-03-23 14:22:41.334 ERROR (Recorder) [homeassistant.components.recorder.core] SQLAlchemyError error processing task CommitTask(): This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (MySQLdb.OperationalError) (1206, 'The total number of locks exceeds the lock table size')
[SQL: INSERT INTO states (entity_id, state, attributes, event_id, last_changed, last_changed_ts, last_updated, last_updated_ts, old_state_id, attributes_id, context_id, context_user_id, context_parent_id, origin_idx) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)]
[parameters: ('sensor.pws_wind_gust', '0.0', None, None, None, None, None, 1679577752.305077, 134360597, 417632, 'redacted', None, None, 0)]
(Background on this error at: https://sqlalche.me/e/20/e3q8) (Background on this error at: https://sqlalche.me/e/20/7s2a)
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 786, in _process_one_task_or_recover
return task.run(self)
File "/usr/src/homeassistant/homeassistant/components/recorder/tasks.py", line 271, in run
instance._commit_event_session_or_retry()
File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 1057, in _commit_event_session_or_retry
self._commit_event_session()
File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 1076, in _commit_event_session
self.event_session.commit()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1903, in commit
trans.commit(_to_root=True)
File "
I apologize for not keeping enough logs of the problem before forcing some fix. I totally understand how this is not helpful when trying to fix a problem being often on the other side of this communication myself :) In the future I will try ot gather intel before attempting a fix if something like this shows up.
2023-03-23 14:22:38.301 ERROR (Recorder) [homeassistant.components.recorder.core] Error in database connectivity during commit: Error executing query: (MySQLdb.OperationalError) (1206, 'The total number of locks exceeds the lock table size')
That generally means its been searching too many rows. If it happens again you can do SHOW ENGINE INNODB STATUS
to see which query is causing the issue
I apologize for not keeping enough logs of the problem before forcing some fix. I totally understand how this is not helpful when trying to fix a problem being often on the other side of this communication myself :) In the future I will try ot gather intel before attempting a fix if something like this shows up.
No worries. As we have fixed the LIMIT 1
case I'm going to close this issue out as the data we need to do any more improvements is no longer available.
If you do find another query that has an issue after 2023.4.x please don't hesitate to start a new issue.
The problem
Hi,
HA version: 2023.3.4
I have observed that the states lookup for enttities that I added to the ecluded list for the recorder are very slow on MySQL 8.0.29 running in a docker container. I traced the issue down to the issue to queries like this one:
SELECT states.entity_id, states.state, states.last_changed_ts, states.last_updated_ts, NULL AS attributes, NULL AS shared_attrs FROM states WHERE states.entity_id = 'sensor.some_sensor' AND states.last_updated_ts < 1679395218.224e0 ORDER BY states.last_updated_ts DESC LIMIT 1
which don't use the index ix_states_entity_id_last_updated_ts due to fact that the index is created with last_update_ts being in default ASC order and the query having an ORDER BY clause specifying DESC order. After changing the index to also be in DESC order the query goes down from ~4m to 0.13ms (states table has 42mil rows for me as I like to keep longer history).
I tried to submit a pull request to fix that but I can't find where this index is defined in the source code.
What version of Home Assistant Core has the issue?
2023.3.4
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant Supervised
Integration causing the issue
No response
Link to integration documentation on our website
No response
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response