home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.58k stars 29.91k forks source link

History stats sensor (type "time") shows incorrect values #114762

Open Al-Dek opened 5 months ago

Al-Dek commented 5 months ago

The problem

History stats sensor ( "time" type) shows incorrect values and has strange behavior.

What version of Home Assistant Core has the issue?

Core 2024.2.5

What was the last working version of Home Assistant Core?

core-2024.2.5

What type of installation are you running?

Home Assistant OS

Integration causing the issue

History

Link to integration documentation on our website

https://www.home-assistant.io/integrations/history_stats/

Diagnostics information

I am checking for a VPN connection via binary_sensor.192_168_1_1 (friendly name "Home (XXXX 3/9)") I get statistics on connection quality for the current day and for 10 days of history using sensors :

---------------------- VPN Fault Duration Today -------------------------------

---------------------- VPN Fault Duration Last 10 days ------------------------

---------------------- VPN Fault Count Last 10 days ---------------------------

---------------------- VPN Service Availability Last 10 days ------------------

image .. image ... image ... image ... image

Real VPN faults data exported from History...

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | entity_id | state | last_changed -- | -- | -- | -- 1 | binary_sensor.192_168_1_1 | off | 2024-03-24T08:55:30.864Z 2 | binary_sensor.192_168_1_1 | on | 2024-03-24T08:56:03.925Z 3 | binary_sensor.192_168_1_1 | off | 2024-03-24T10:34:16.769Z 4 | binary_sensor.192_168_1_1 | on | 2024-03-24T10:34:30.812Z 5 | binary_sensor.192_168_1_1 | off | 2024-03-24T18:46:39.316Z 6 | binary_sensor.192_168_1_1 | on | 2024-03-24T18:46:55.366Z 7 | binary_sensor.192_168_1_1 | off | 2024-03-24T23:30:25.525Z 8 | binary_sensor.192_168_1_1 | on | 2024-03-24T23:31:40.584Z 9 | binary_sensor.192_168_1_1 | off | 2024-03-24T23:32:16.620Z 10 | binary_sensor.192_168_1_1 | on | 2024-03-24T23:35:21.763Z 11 | binary_sensor.192_168_1_1 | off | 2024-03-25T08:56:02.819Z 12 | binary_sensor.192_168_1_1 | on | 2024-03-25T08:56:35.875Z 13 | binary_sensor.192_168_1_1 | off | 2024-03-25T13:13:49.719Z 14 | binary_sensor.192_168_1_1 | on | 2024-03-25T13:15:19.810Z 15 | binary_sensor.192_168_1_1 | off | 2024-03-25T17:46:46.441Z 16 | binary_sensor.192_168_1_1 | on | 2024-03-25T17:47:00.487Z 17 | binary_sensor.192_168_1_1 | off | 2024-03-26T13:15:20.371Z 18 | binary_sensor.192_168_1_1 | on | 2024-03-26T13:15:53.424Z 19 | binary_sensor.192_168_1_1 | off | 2024-03-27T13:15:53.023Z 20 | binary_sensor.192_168_1_1 | on | 2024-03-27T13:16:26.084Z 21 | binary_sensor.192_168_1_1 | off | 2024-03-27T18:13:27.914Z 22 | binary_sensor.192_168_1_1 | on | 2024-03-27T18:13:41.961Z 23 | binary_sensor.192_168_1_1 | off | 2024-03-28T13:16:30.260Z 24 | binary_sensor.192_168_1_1 | on | 2024-03-28T13:17:03.318Z 25 | binary_sensor.192_168_1_1 | off | 2024-03-28T18:00:51.374Z 26 | binary_sensor.192_168_1_1 | on | 2024-03-28T18:01:05.421Z 27 | binary_sensor.192_168_1_1 | off | 2024-03-29T13:17:14.193Z 28 | binary_sensor.192_168_1_1 | on | 2024-03-29T13:17:30.234Z 29 | binary_sensor.192_168_1_1 | off | 2024-03-30T13:17:39.384Z 30 | binary_sensor.192_168_1_1 | on | 2024-03-30T13:18:12.445Z 31 | binary_sensor.192_168_1_1 | off | 2024-03-30T19:16:29.128Z 32 | binary_sensor.192_168_1_1 | on | 2024-03-30T19:16:45.175Z 33 | binary_sensor.192_168_1_1 | off | 2024-03-31T13:18:10.690Z 34 | binary_sensor.192_168_1_1 | on | 2024-03-31T13:18:43.748Z 35 | binary_sensor.192_168_1_1 | off | 2024-03-31T17:40:15.846Z 36 | binary_sensor.192_168_1_1 | on | 2024-03-31T17:40:29.893Z 37 | binary_sensor.192_168_1_1 | off | 2024-04-01T09:49:15.225Z 38 | binary_sensor.192_168_1_1 | on | 2024-04-01T09:58:40.630Z 39 | binary_sensor.192_168_1_1 | off | 2024-04-01T13:18:43.214Z 40 | binary_sensor.192_168_1_1 | on | 2024-04-01T13:19:16.297Z 41 | binary_sensor.192_168_1_1 | off | 2024-04-01T17:53:11.059Z 42 | binary_sensor.192_168_1_1 | on | 2024-04-01T17:53:25.105Z 43 | binary_sensor.192_168_1_1 | off | 2024-04-02T13:19:16.966Z 44 | binary_sensor.192_168_1_1 | on | 2024-04-02T13:19:50.020Z 45 | binary_sensor.192_168_1_1 | off | 2024-04-02T16:05:12.135Z 46 | binary_sensor.192_168_1_1 | on | 2024-04-02T16:05:26.179Z

Example YAML snippet

No response

Anything in the logs that might be useful for us?

There are no any related errors/warnings in the logs.
HA restart or host reboot does not solver problem.

Additional information

No response

home-assistant[bot] commented 5 months ago

history_stats documentation history_stats source

JonLaliberte commented 4 months ago

I'm seeing the same thing happening - or at least, some version of bad data coming out of history stats sensors.

I have a simple template binary sensor that changes to off/on when the Kids watch plex, and a history_stats sensor to calculate the time it has been on:

  - platform: history_stats
    name: Kids Watching Plex Duration 7 Days
    entity_id: binary_sensor.kids_watching_plex
    state: 'on'
    type: time
    end: "{{ now() }}"
    duration:
      days: 7

I created the binary sensor a day or two ago, and the 7 day history stats I created a few minutes ago shows:

Screenshot 2024-05-06 at 3 33 00 PM

My understanding is that it should be showing hours. So, in the last 2 days the kids watched nearly 122 hours of TV. Must have been binging Bluey again! 😉

I noticed the same issue when I made one for today only... When first created, it had junk data, it was showing something like 16 hours when it had been about 20 minutes.

Al-Dek commented 3 months ago

The error keeps appearing. Now the sensor for duration of the ON-state (vpn_online_duration_7d) shows a sharp decrease in value and the OFF-state sensor (vpn_fault_duration_7d) shows an increase on same value. After that, each sensor update period (~30 seconds), the sensors values changes by 34 seconds to the "right" side. no any errors in the logs.

vpn_online_duration_7d image

vpn_fault_duration_7d image

karwosts commented 2 months ago

I think I can probably explain why you're seeing the spikes, and it's because history_stats, when it finds no recorder data in the time range it is searching, extrapolates the oldest datapoint in the recorder backwards to the start time of the window.

So if you imagine that every day you have an outage for one second at 10:00 AM (sensor turns to off), and then turns back on at 10:00:01AM, eventually after purging your records will start with an oldest datapoint of off at 10:00AM.

If your start window extends back earlier than that, it will assume that the sensor has been off for all the hours leading up to that point, and you get a big jump in fault duration.

The way you could work around this is to not make your history_stats window so close to the recorder duration, either make your recorder duration 20 days, and keep the history_stats at 10 days, or keep your recorder duration at 10 days, and make your history_stats window 5 days, so that you're more likely to not be scanning the interval before recorded history.

I don't exactly love that behavior, but I believe that's what it is at the moment, it bites me quite often too.

PilgrimToHyperion commented 2 months ago

I'm not too sure whether this is relevant here. I created sensors using history stats to show how often my fridge is opened / closed and for how long. The "count" sensors work perfectly but the "time" sensors just gives the last time the fridge was opened.

image

Al-Dek commented 2 months ago

I think I can probably explain why you're seeing the spikes, and it's because history_stats, when it finds no recorder data in the time range it is searching, extrapolates the oldest datapoint in the recorder backwards to the start time of the window.

So if you imagine that every day you have an outage for one second at 10:00 AM (sensor turns to off), and then turns back on at 10:00:01AM, eventually after purging your records will start with an oldest datapoint of off at 10:00AM.

If your start window extends back earlier than that, it will assume that the sensor has been off for all the hours leading up to that point, and you get a big jump in fault duration.

The way you could work around this is to not make your history_stats window so close to the recorder duration, either make your recorder duration 20 days, and keep the history_stats at 10 days, or keep your recorder duration at 10 days, and make your history_stats window 5 days, so that you're more likely to not be scanning the interval before recorded history.

I don't exactly love that behavior, but I believe that's what it is at the moment, it bites me quite often too.

I disagree. Work around history_stat window or recorder duration is the wrong way. Look at the graphs - the value changes instantly, it happens at any given time and has no connection with the purging of old data. And most importantly, after the peak, the value decreases with each sensor count. How can a break , say, 1000 seconds duration decrease in the past? This is clearly a sensor error.

Jobsa commented 1 month ago

I think I can probably explain why you're seeing the spikes, and it's because history_stats, when it finds no recorder data in the time range it is searching, extrapolates the oldest datapoint in the recorder backwards to the start time of the window. So if you imagine that every day you have an outage for one second at 10:00 AM (sensor turns to off), and then turns back on at 10:00:01AM, eventually after purging your records will start with an oldest datapoint of off at 10:00AM. If your start window extends back earlier than that, it will assume that the sensor has been off for all the hours leading up to that point, and you get a big jump in fault duration. The way you could work around this is to not make your history_stats window so close to the recorder duration, either make your recorder duration 20 days, and keep the history_stats at 10 days, or keep your recorder duration at 10 days, and make your history_stats window 5 days, so that you're more likely to not be scanning the interval before recorded history. I don't exactly love that behavior, but I believe that's what it is at the moment, it bites me quite often too.

I disagree. Work around history_stat window or recorder duration is the wrong way. Look at the graphs - the value changes instantly, it happens at any given time and has no connection with the purging of old data. And most importantly, after the peak, the value decreases with each sensor count. How can a break , say, 1000 seconds duration decrease in the past? This is clearly a sensor error.

I agree with @karwosts. Have to think about the moving window of data to understand this. The 'value changes instantly' because that is the time that history_stats is suddenly backward extrapolating a new value. The value then decreases because as time goes on there is less of that bogus extrapolation in the window.

Tschuuuls commented 2 days ago

I'm trying to count the number of hours a relay has been on for the current day with history stats. I basically used the example from the docs . And I'm running Core 2024.9.0 and Frontend 20240904.0. I set start to {{today_at('00:00')}} and end to {{now()}}.

The history stats concluded that the relay has been on for 47:17h today 🤣 (and it keeps rising while the relay is off?) This has previously worked just fine.

The only weirdness I can see in the sensor history data is that the relay has been turned off by an automation even though it had been off already. So the state changed from off to off.

I'm just a bit confused that a core home assistant integration doesn't do a basic check before returning data. If the resulting time is greater than the selected time period there has to be something wrong and returning this false data is only going to cause issues.

JonLaliberte commented 2 days ago

@Tschuuuls

{{today_at('00:00')}} and end to {{today_at('00:00')}}.

Was this a typo?

I'm just a bit confused that a core home assistant integration doesn't do a basic check before returning data. If the resulting time is greater than the selected time period there has to be something wrong and returning this false data is only going to cause issues.

Yeah, there seems to be a few sanity checks not happening here.

Tschuuuls commented 2 days ago

@Tschuuuls

{{today_at('00:00')}} and end to {{today_at('00:00')}}.

Was this a typo?

Oh sorry, end should be {{now()}}. Makes more sense that way ;)

JonLaliberte commented 2 days ago

I've been having trouble with this one recently...

  - platform: history_stats
    name: Pool SWG on today
    entity_id: switch.pool_swg_switch
    unique_id: 'pool_swg_on_today_history_stats'
    state: 'on'
    type: time
    start: "{{ now().replace(hour=0, minute=0, second=0) }}"
    end: "{{ now() }}"

image Green lines are when the sensor resets properly. Red are when it doesn't reset at start of day.

Tschuuuls commented 2 days ago

Creating a new Helper on the same Entity will give me the same bogus value for the current day. Let's see if it magically starts working again tomorrow.