home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.06k stars 29.72k forks source link

`history.get_last_state_changes` is poorly optimized which results in overly long queries with the filter integration #81989

Closed mmiller7 closed 1 year ago

mmiller7 commented 1 year ago

The problem

Upgrading to 2022.11 (or newer) the filter entities are extremely slow to start up (nearly 10 minutes). Reverting to 2022.10.5 startup is fast again (15 seconds). No errors appear in the regular logfile. Also sensor data does not update properly now (flat-line) after update.

Starting sensor.filter, not everything will be available until it is finished.

What version of Home Assistant Core has the issue?

core-2022.11.2 and newer

What was the last working version of Home Assistant Core?

core-2022.10.5

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Filters

Link to integration documentation on our website

https://www.home-assistant.io/integrations/filter/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

Thread on HA forums: https://community.home-assistant.io/t/after-2022-11-startup-really-slow/487368

Workaround: Downgrade to 2022.10.5

mmiller7 commented 1 year ago

Log file with debug on

home-assistant.log_filter_startup.txt

mmiller7 commented 1 year ago

This is still an issue in 2022.11.3, filter takes exceptionally long time to start.

gigatexel commented 1 year ago

I'm experiencing the exact same issue.

mmiller7 commented 1 year ago

Still an issue in 2022.12.1.

mmiller7 commented 1 year ago

Also if you ignore it, when the filters eventually start it looks like everything is flat-lined.

Example - going from 2022.10.5 -> 2022.12.1 for a few hours -> rollback 2022.10.5 image

gigatexel commented 1 year ago

Still an issue here. 500s suspiciously looks like some kind of wait-for-condition not ending

image

gigatexel commented 1 year ago

Still an issue in 2022.12.1.

Could you fill in all fields in the opening post (such as correct integration name and link to the docs)? Otherwise, the integration developer will not be tagged and this issue will likely not get resolved.

mmiller7 commented 1 year ago

Still an issue in 2022.12.1.

Could you fill in all fields in the opening post (such as correct integration name and link to the docs)? Otherwise, the integration developer will not be tagged and this issue will likely not get resolved.

I don't understand?

I already had the integration name and the logs and linked thread reflect that too, I now added the link to the docs. I don't know what more I am expected to provide?

If you see where I've put a wrong thing please do help me figure out what it ought to be because I'm not understanding.

gigatexel commented 1 year ago

@dgomes can you help us here?

filter has also stopped working on my system too

dgomes commented 1 year ago

Nothing has changed in the filter integration in several releases... I would guess the issue is something related to the recorder (from which the filter ask information on load)

I see the graph but the log does not relate to the graph.... could I get a new log matching a graph ?

gigatexel commented 1 year ago

Nothing has changed in the filter integration in several releases... I would guess the issue is something related to the recorder (from which the filter ask information on load)

I see the graph but the log does not relate to the graph.... could I get a new log matching a graph ?

Do you mean the HA bootup log?

2022-12-13 10:47:03.751 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: sensor.filter
2022-12-13 10:48:03.899 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: sensor.filter
2022-12-13 10:49:03.992 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: sensor.filter
2022-12-13 10:50:04.096 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: sensor.filter
2022-12-13 10:51:04.186 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: sensor.filter
2022-12-13 10:52:04.275 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: sensor.filter
2022-12-13 10:53:04.359 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: sensor.filter
2022-12-13 10:54:04.440 WARNING (MainThread) [homeassistant.bootstrap] Waiting on integrations to complete setup: sensor.filter
2022-12-13 10:54:21.025 WARNING (MainThread) [homeassistant.components.sensor] Timed out adding entities for domain sensor with platform filter after 500s
dgomes commented 1 year ago

I would like to gather log messages all the way from working -> not working -> working again (thanks to version changes)

gigatexel commented 1 year ago

HA has logs everywhere :-) Which ones do you need?

dgomes commented 1 year ago

home-assistant.log

gigatexel commented 1 year ago

2022.10.5 image

2022.12.4 image homeassistant_20221204.log homeassistant_20221005.log

dgomes commented 1 year ago

please share the yaml configuration of all filters

mmiller7 commented 1 year ago

please share the yaml configuration of all filters

Is there some easy way to get all of them? I use packages and I probably have a couple dozen package-files with filters spread between them, it's going to take some time if I have to hunt them all down by hand and copy them into a single file and sanitize any sensitive personal information from each package file.

dgomes commented 1 year ago

that's actually an important debugging task:

gigatexel commented 1 year ago

I only have one filter sensor (never had more than one)

  - platform: filter
    name: "Regenwater ADC LPF"
    entity_id: sensor.shelly_uni_regenwater_adc
    filters:
      - filter: lowpass   
        precision: 1
        time_constant: 5
mmiller7 commented 1 year ago

So I think I've narrowed it down to the filters on all my (15-20) temperature+humidity sensors. If I cut all of them out startup is like 10-ish seconds.

It looks like for each temp+humidity sensor (which are all duplicates of this) the startup time goes up by about 30 seconds per. Just "outside_front" is like 33 seconds start, "outside_front" and "outside_rear" is like 65 seconds. I've got one of these pairs for each room.

The source sensors are MQTT sensors which update about every 15 seconds.

At the moment I don't have time to spend chasing this more tonight so I'll have to leave it at this for a bit, but maybe someone can figure out then how to reproduce or what to try next.


# Sensor Filter Package File

sensor:

  - platform: filter
    name: "Outside Front Temperature Filtered"
    entity_id: sensor.outside_front_temperature
    filters:
      - filter: outlier
        window_size: 3
        radius: 10.0
      - filter: time_simple_moving_average
        window_size: 00:01
        precision: 1
  - platform: filter
    name: "Outside Front Humidity Filtered"
    entity_id: sensor.outside_front_humidity
    filters:
      - filter: outlier
        window_size: 3
        radius: 10.0
      - filter: time_simple_moving_average
        window_size: 00:01
        precision: 1

  - platform: filter
    name: "Outside Rear Temperature Filtered"
    entity_id: sensor.outside_rear_temperature
    filters:
      - filter: outlier
        window_size: 3
        radius: 10.0
      - filter: time_simple_moving_average
        window_size: 00:01
        precision: 1
  - platform: filter
    name: "Outside Rear Humidity Filtered"
    entity_id: sensor.outside_rear_humidity
    filters:
      - filter: outlier
        window_size: 3
        radius: 10.0
      - filter: time_simple_moving_average
        window_size: 00:01
        precision: 1
dgomes commented 1 year ago

Another question: do you use the default database (SQLite) or something different (MySQL ?)

gigatexel commented 1 year ago

I'm using MySQL using the official integration.

mmiller7 commented 1 year ago

MariaDB (official addon), 14 days history limit on recorder

mmiller7 commented 1 year ago

I would like to gather log messages all the way from working -> not working -> working again (thanks to version changes)

Please see attached.

All good and working: working pre-upgrade 2022-10-5 log.txt

Upgraded, filters take forever to start and don't properly start: broken post-upgrade 2022-12-7 log.txt

Rolled back to old version, all working again: rollback working 2022-10-5 log.txt

gigatexel commented 1 year ago

Some more input. After a few days, the filter sensor kicks in.

image

dgomes commented 1 year ago

Thanks, this last comment gives me a clue that the issue might be related to the history functionality (filter is initialised with historic information)

I will try to address this during the holiday season...

mmiller7 commented 1 year ago

Thanks, this last comment gives me a clue that the issue might be related to the history functionality (filter is initialised with historic information)

I will try to address this during the holiday season...

Happy to run more tests or capture more logs - I've just exhausted my ideas of what else to try.

home-assistant[bot] commented 1 year ago

filter documentation filter source

mmiller7 commented 1 year ago

Still an issue in 2023.1

After a bit more trial and error, it looks like maybe its related specifically to the "outlier" filter. I commented that out in all my filters and it seems now to be working better.

With "outlier" removed: image

I hope this can ultimately be fixed, the reason I was using the outlier and average was to avoid the intermittent bad-data I get from some of my thermostats and flatten out some of the jitter as they report in. I also tried just "outlier" with "average" removed and it still exhibited the problems so I think it's a bug in the "outlier" code.

gigatexel commented 1 year ago

This config caused 500s timeout for initialisation of the filter component, so "outliers" does not seem to be the issue.

  - platform: filter
    name: "Regenwater ADC LPF"
    entity_id: sensor.shelly_uni_regenwater_adc
    filters:
      - filter: lowpass   
        precision: 1
        time_constant: 5

However, since 2023.1 it works and no longer timeouts but still takes a suspiciously long time to initialize.

image

mmiller7 commented 1 year ago

Interesting. @gigatexel - do you have any outlier filters by chance? I wonder if it's a common code that the x-pass and outlier filters use which is the issue?

I have no high/low pass filters in my config, but I bet (guessing, as a software engineer who hasn't seen the code) outlier uses a combination of them "under the hood"

gigatexel commented 1 year ago

Updated to 2023.2.0 but still no fix. Annoying thing is that the long startup-time for filter also prevents other integrations to start in the meanwhile.

image

mmiller7 commented 1 year ago

Yeah for now I've "fixed" mine by changing to a crude template that I can manually throw away values too far from what I expect...but it's a poor substitute for the filters.

dgomes commented 1 year ago

still can't reproduce the issue :/

gigatexel commented 1 year ago

@dgomes I downloaded the 4 files for filter component, created a folder filter in custom_components. image

I then randomly filled the sensor.py file with log entries like this, to try to identify what call is causing the timeout. image

But unfortunately, my log messages don't appear in the core log. Any idea what I'm doing wrong?

dgomes commented 1 year ago

Have you enabled debug in the logger section in configuration.yaml ?

If you are OK diving into python code, please reach for me on HA discord (I use the same handle: dgomes) and I can help you

gigatexel commented 1 year ago

Have you enabled debug in the logger section in configuration.yaml ?

If you are OK diving into python code, please reach for me on HA discord (I use the same handle: dgomes) and I can help you

I think I have:


logger:
  default: warning
  logs:
    custom_components.filter: debug
dgomes commented 1 year ago

@bdraco: me and @gigatexel have traced the issue to the recorder component (history.get_last_state_changes and history.state_changes_during_period). The database of @gigatexel is quite large (5Gb), would you help us out here ?

gigatexel commented 1 year ago

I did a reboot of my MariaDB docker image (10.6.11 - running on NAS)

Significant drop in CPU and Disk I/O and History is somewhat reponsive. Just to make sure, I upgraded MariaDB from 10.6.11 to 10.10 and performand an upgrade of the DB scheme.

I still see quite high load (just below 100%!) on the disks and filter integration is still not loading. Shutting down MariaDB container reduces all load to normal values (MariaDB is serving HA exclusively)

gigatexel commented 1 year ago

CPU Utilisation

Start of the graph is 10.6.11 Then MariaDB was shut down, second plateau is upgrade of the DB to 10.10 sceme Last plateau is currently running 10.10 and HA

image

Disk utilisation (2disks, raid 1, Synology DS220+). Plateau's are very visible.

image
bdraco commented 1 year ago

Can you post a process list from MariaDB?

gigatexel commented 1 year ago
image
gigatexel commented 1 year ago
+----+---------------+------------------+---------------+---------+------+---------------------+------------------------------------------------------------------------------------------------------+----------+
| Id | User          | Host             | db            | Command | Time | State               | Info                                                                                                 | Progress |
+----+---------------+------------------+---------------+---------+------+---------------------+------------------------------------------------------------------------------------------------------+----------+
| 84 | homeassistant | 172.17.0.1:42582 | homeassistant | Query   | 167  | Creating sort index | SELECT states.entity_id, states.state, NULL AS last_changed_ts, states.last_updated_ts, states.attri | 0.000    |
| 85 | homeassistant | 172.17.0.1:42622 | homeassistant | Sleep   | 4    |                     |                                                                                                      | 0.000    |
| 89 | root          | localhost        |               | Query   | 0    | starting            | show processlist                                                                                     | 0.000    |
+----+---------------+------------------+---------------+---------+------+---------------------+------------------------------------------------------------------------------------------------------+----------+

full query:


FROM states LEFT OUTER JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id 
WHERE (states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.entity_id = 'sensor.shelly_uni_regenwater_adc' ORDER BY states.entity_id, states.last_updated_ts DESC 
 LIMIT 1 
bdraco commented 1 year ago

Something is cut off there can you post a

Show full ?

gigatexel commented 1 year ago

Something is cut off there can you post a

Show full ?

I edited the post to add the full query.

home-assistant[bot] commented 1 year ago

Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration (recorder) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `recorder` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Change the title of the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign recorder` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


recorder documentation recorder source (message by IssueLinks)

bdraco commented 1 year ago

Part of the select is cut off

SELECT states.entity_id, states.state, NULL AS last_changed_ts, states.last_updated_ts, states.attri

... missing...

FROM states LEFT OUTER JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id 
WHERE (states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.entity_id = 'sensor.shelly_uni_regenwater_adc' ORDER BY states.entity_id, states.last_updated_ts DESC 
 LIMIT 1 
gigatexel commented 1 year ago

MariaDB [(none)]> show full processlist;
+-----+---------------+------------------+---------------+---------+------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| Id  | User          | Host             | db            | Command | Time | State               | Info                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Progress |
+-----+---------------+------------------+---------------+---------+------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|  84 | homeassistant | 172.17.0.1:42582 | homeassistant | Query   | 1293 | Creating sort index | SELECT states.entity_id, states.state, NULL AS last_changed_ts, states.last_updated_ts, states.attributes, state_attributes.shared_attrs 
FROM states LEFT OUTER JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id 
WHERE (states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.entity_id = 'sensor.shelly_uni_regenwater_adc' ORDER BY states.entity_id, states.last_updated_ts DESC 
 LIMIT 1 |    0.000 |
| 108 | root          | localhost        | NULL          | Sleep   |  941 |                     | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |    0.000 |
| 112 | root          | localhost        | NULL          | Query   |    0 | starting            | show full processlist                                                                                                                                                                                                                                                                                                                                                                                                                                                 |    0.000 |
| 113 | homeassistant | 172.17.0.1:44374 | homeassistant | Query   |   62 | Creating sort index | SELECT states.entity_id, states.state, NULL AS last_changed_ts, states.last_updated_ts, states.attributes, state_attributes.shared_attrs 
FROM states LEFT OUTER JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id 
WHERE (states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.entity_id = 'sensor.shelly_uni_regenwater_adc' ORDER BY states.entity_id, states.last_updated_ts DESC 
 LIMIT 1 |    0.000 |
| 114 | homeassistant | 172.17.0.1:44450 | homeassistant | Sleep   |   10 |                     | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |    0.000 |
+-----+---------------+------------------+---------------+---------+------+---------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
5 rows in set (0.000 sec)

MariaDB [(none)]> 
bdraco commented 1 year ago

Can you run an explain on that?

Explain SELECT states.entity_id, states.state, NULL AS last_changed_ts, states.last_updated_ts, states.attributes, state_attributes.shared_attrs FROM states LEFT OUTER JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id WHERE (states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.entity_id = 'sensor.shelly_uni_regenwater_adc' ORDER BY states.entity_id, states.last_updated_ts DESC LIMIT 1

gigatexel commented 1 year ago
Explain SELECT states.entity_id, states.state, NULL AS last_changed_ts, states.last_updated_ts, states.attributes, state_attributes.shared_attrs
FROM states LEFT OUTER JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id
WHERE (states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.entity_id = 'sensor.shelly_uni_regenwater_adc' ORDER BY states.entity_id, states.last_updated_ts DESC
LIMIT 1;

1   SIMPLE  states  range   Entity_ID,ix_states_entity_id_last_updated_ts   ix_states_entity_id_last_updated_ts 1023    NULL    3305024 Using where 
1   SIMPLE  state_attributes    eq_ref  PRIMARY PRIMARY 4   homeassistant.states.attributes_id  1   Using where