home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
70.05k stars 29.12k forks source link

Statistics sensor logs bad data right after HA restart #98262

Open gregtakacs opened 11 months ago

gregtakacs commented 11 months ago

The problem

I have two template sensors that logs pH and ORP data from a REST call. Then I added four statistics sensors that track the 24hr minimum and maximum values for these two template sensors.

The issue I'm seeing is that every time I restart Home Assistant the min and max values that get stored in recorder are corrupt and store the current value rather than the actual min/max value as they should. It quickly recovers afterward but the spikes give me concern as I rely on making automated decisions based on these values and if the decision just happens to be made right after a restart the results can be bad (adding too much bleach or acid to my pool)

ph_orp_statistics_problem

states-export.csv

What version of Home Assistant Core has the issue?

core-2023.8.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Statistics

Link to integration documentation on our website

https://www.home-assistant.io/integrations/statistics/

Diagnostics information

No response

Example YAML snippet

pool:
  sensor:
    - platform: rest
      unique_id: fabc1ee2-0bbe-416e-b23d-2474ac25fe4e
      name: iopool
      resource: https://api.iopool.com/v1/pool/48c98574-188c-42d0-9cfc-7c9e029318aa
      value_template: "{{ value_json.title }}"
      json_attributes:
        - id
        - latestMeasure
        - hasAnActionRequired
        - advice
        - mode
      headers:
        x-api-key: !secret iopool_api_key
      scan_interval: 300
      icon: mdi:pool
    - platform: statistics
      name: "Daily ORP Min"
      unique_id: d55f6242-379f-11ee-be56-0242ac120002
      entity_id: sensor.iopool_orp
      state_characteristic: value_min
      max_age:
        hours: 24
    - platform: statistics
      name: "Daily ORP Max"
      unique_id: f7b134f8-1fff-4c29-8910-b88f5e62ce3b
      entity_id: sensor.iopool_orp
      state_characteristic: value_max
      max_age:
        hours: 24
    - platform: statistics
      name: "Daily pH Min"
      unique_id: 84a50585-bbda-48f8-b5a3-06393f384340
      entity_id: sensor.iopool_ph
      state_characteristic: value_min
      max_age:
        hours: 24
    - platform: statistics
      name: "Daily pH Max"
      unique_id: b132e3f6-81ab-4242-98a0-93efedb37fac
      entity_id: sensor.iopool_ph
      state_characteristic: value_max
      max_age:
        hours: 24

  template:
    - sensor:
        - name: "ioPool pH"
          unique_id: f4804a67-1224-4507-a4fb-21d983958b7c
          state: "{{ state_attr('sensor.iopool', 'latestMeasure')['ph'] | round(3) }}"
          device_class: ph
          unit_of_measurement: "pH"
          attributes:
            source: "{{ state_attr('sensor.iopool', 'latestMeasure')['mode'] }}"
            isValid: "{{ state_attr('sensor.iopool', 'latestMeasure')['isValid'] }}"
            measuredAt: "{{ state_attr('sensor.iopool', 'latestMeasure')['measuredAt'] }}"
        - name: "ioPool ORP"
          unique_id: e0ef9122-c53a-41ae-be72-517f3fcbb443
          state: "{{ state_attr('sensor.iopool', 'latestMeasure')['orp'] | round(0) }}"
          unit_of_measurement: "mV"
          device_class: voltage
          attributes:
            source: "{{ state_attr('sensor.iopool', 'latestMeasure')['mode'] }}"
            isValid: "{{ state_attr('sensor.iopool', 'latestMeasure')['isValid'] }}"
            measuredAt: "{{ state_attr('sensor.iopool', 'latestMeasure')['measuredAt'] }}"

Anything in the logs that might be useful for us?

I ran this SQL to pull data:

        SELECT
          *
        FROM
          states
          INNER JOIN states_meta ON
            states.metadata_id = states_meta.metadata_id
        WHERE
          (states_meta.entity_id = 'sensor.iopool_ph' OR states_meta.entity_id = 'sensor.daily_ph_min' OR states_meta.entity_id = 'sensor.daily_ph_max')
AND states.state_id > 27587000
--         AND last_updated_ts <= strftime('%s', 'now', '-3 hour')
        ORDER BY
          last_updated_ts DESC
        LIMIT
          10000;


### Additional information

_No response_
home-assistant[bot] commented 11 months ago

Hey there @thomdietrich, mind taking a look at this issue as it has been labeled with an integration (statistics) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `statistics` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign statistics` Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


statistics documentation statistics source (message by IssueLinks)

ThomDietrich commented 11 months ago

Hey @gregtakacs, I understand your issue and this is indeed concerning. I was sadly not quite able to understand where it comes from though. What does "quickly recovers" really mean?

Could you please check the attributes of your sensors. According to buffer_usage_ratio and/or age_coverage_ratio, is your statistics sensor cache reset upon home assistant restart (i.e. both ratios would be very close to 0)?

gregtakacs commented 11 months ago

Hey @ThomDietrich

Quickly recovers means within a fraction of a second.

Here is the detail surrounding one of these reset events:

ph_reset_event ph_reset_event_chart

You can see that the statistics sensors came up before the REST sourced template sensor loaded as we get an unavailable for both statistics sensors at the 9:39:41.888/9 mark but we don't get a unknown value until 9:39:42.327 on the template sensor.

But then we get a good reading from the REST template sensor at 9:39:53.063 and the statistics sensors update right after at 9:39:53.102/3 BUT with a wrong value! And this is the spike in the data. Then almost instantly at 9:39:53.256/71 the statistics value gets corrected to the right value and things settle down after that.

Seeing how quickly I get a good value after the bad value I don't think this is a concern from an automation standpoint, it's more of a nuisance to have those spikes in the data although they can be used as indicators of a reset event so maybe it's a feature not a bug but I am pretty certain it's a bug ;-).

gregtakacs commented 11 months ago

Actually I think it's two separate issues, likely caused by the same bug: 1) the reading should never go to unavailable as long as there are records to be used to make a determination 2) the data should not get that incorrect value after the unavailable reading

gregtakacs commented 11 months ago

I also confirmed after a reboot this is the data in the attributes field: Age coverage ratio -0.01 Source value valid false

gregtakacs commented 11 months ago

@ThomDietrich anything else you need from my end to repro?

ErikKok commented 9 months ago

I got the same issue for an average_linear statistics sensor. And for a mean variant.

I just notice this now I use this sensor for heating again, so somewhere between last winter and now this went buggy. I am using mariaDB btw.

Below you see the value right after restart is 19,9, the room never was this warm. The peak at 18:00 yesterday is another HA restart with an even higher peak. It took about 30-40 minutes to return to normal.

image
Flox-Muc commented 6 months ago

This bug still exists in 2024.1.2. After a restart I often get crazy values with my statistics sensors that look totally random.

marcelhoogantink commented 4 months ago

I can confirm that this issue is still a 'live' one in HA 2024.3.1. I also have this issue. I 'solved' this partly with calling the statistics-reload service after restart HA. (see https://github.com/home-assistant/core/issues/70011#issuecomment-1139936805)

AnonymousRetard commented 2 months ago

I also have this issue in 2024.5.4. An average_linear sensor with max_age 48 hours. Every restart so far has caused it to report crazy values several orders of magnitude off and the "Age coverage ratio" restarts from 0, the value slowly returns to the correct one but it takes a full 48 hours to do so because of the totally crazy starting values (these can be either positive or negative and in the thousands even though the values it's supposed to average right now are temperatures in the 10-25C range).

I just tried adding the solution from above (restarting statistics upon restart) and I still see momentarily crazy high values upon restart of homeassistant but it seems like after statistics have been restarted it immediately returns a good value again with "Age coverage ratio" at 1.

devjklein commented 1 month ago

Also dealing with this issue in 2024.5.5. Glad to see there is an active issue. I am using a couple average_linear sensors with sampling_size: 9999 and max_age set to 10 minutes and 24 hours. Behavior occurs on restart, reflashing the source esp device wirelessly, and when I reload the statistics entities config in developer tools.

jm-cook commented 1 month ago

I also have this issue. I am now on 2024.6.1 but have had it for some time. The following gives changing values through a reboot when it absolutely should not (its measuring the max value of another sensor over 10 days):

   - platform: statistics
      name: "Well level max"
      unique_id: 281db341-c27f-47a9-a9bf-3606b5d1f93d
      entity_id: sensor.water_depth
      state_characteristic: value_max
      max_age:
        days: 10

image

I also had a sensor that should show the change in the last 24 hours but that didn't hardly survive a reboot at all and just reset most of the time when rebooted.

like this:

  - platform: statistics
    name: "water 24 test"
    entity_id: sensor.water_meter
    state_characteristic: change
    max_age:
      hours: 24

image

I gave up with the last one and used an sql sensor.