History stats - ratio - bug

bobofruit1970 commented 2 years ago

The problem

I have for over a year below entity in my config. It worked until update of CORE HA in 04/06/2022

platform: history_stats name: FAN łazienka ON ratio entity_id: switch.wall_switch_left_158d00022357a3 state: 'on' type: ratio end: '{{ now() }}' duration: hours: 24

I have more such such entities showing "RATIO" and all of them since upgrade work bad. Yesterday I upgraded HA to 2022.6.2 but it does not solved problem. I have not changed anything in definitions of Entities counting "ratio". Bat as You can see pictures are strange until upgrade to version 2022.6.1. In addition, note that such drastic jumps in the charts are impossible in such statistics calculated on a daily or weekly basis.

Below screens

last few months until upgrade - entity counting "ratio"
last few days for the same entity

3, last 24 hours of entity from which is counted "ratio" new ha 6

last few months until upgrade for other entitie counting "ratio"

What version of Home Assistant Core has the issue?

core-2022.6.2

What was the last working version of Home Assistant Core?

core-2022.5.XXXXX

What type of installation are you running?

Home Assistant Supervised

Integration causing the issue

history_stats

Link to integration documentation on our website

https://www.home-assistant.io/integrations/history_stats/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

probot-home-assistant[bot] commented 2 years ago

history_stats documentation history_stats source _{^{(message by IssueLinks)}}

bdraco commented 2 years ago

Which version did the problem start on?

When you say 04/06/2022 do you mean June 4th 2022 or April 6th 2022?

Can you explain a bit more beyond work bad about what you are expecting to see? Do the drops in the graph correspond with restarts?

bobofruit1970 commented 2 years ago

I would like to see solid line like on pic 1 and 4 - from the beginning of the chart to the green arrow (June upgrade of HA) I mean June 4th 2022 when I did an upgrade.

bdraco commented 2 years ago

Where version were you on before 2022.6.x? (core-2022.5.?)

bobofruit1970 commented 2 years ago

Can not remember number. The last one may version.

bdraco commented 2 years ago

Do the drops in the graph correspond with restarts?

bobofruit1970 commented 2 years ago

Drops and jumps corresponds with restarts. But restart take 40 sec. Drops on my charts take hours.

bdraco commented 2 years ago

So the data is incorrect after the restart for a few minutes and then it fixes itself?

bdraco commented 2 years ago

I set up your example sensor to using my lights as a test. I get drop outs during restarts. That's expected though

bdraco commented 2 years ago

Which database are you using? Can you paste in your system health information for the recorder?

bdraco commented 2 years ago

It looks like this integration doesn't handle the recorder being far behind either.

What do you have your recorder commit interval set at?

bobofruit1970 commented 2 years ago

This is "Recorder" config from configuration.yaml file:

recorder: purge_keep_days: 365 auto_purge: false commit_interval: 1 exclude: domains:

weblink
updater entities:
sun.sun

bobofruit1970 commented 2 years ago

Everything worked until June upgrade.

system health information for the recorder

bobofruit1970 commented 2 years ago

My config for entity was like this and it worked well until June upgrade:

platform: history_stats name: FAN łazienka ON ratio entity_id: switch.wall_switch_left_158d00022357a3 state: 'on' type: ratio end: '{{ now() }}' duration: hours: 24

Today I saw on HA history_stats documentation (https://www.home-assistant.io/integrations/history_stats) that end should be in quotation marks. But it worked well befire june upgrade.

I have changed end to quotation mark for all of "ratio entities" and restarted HA. end: "{{ now() }}"

A few minutes after restart my stats show something weird. Maybe it needs time to get some new samples. I will be watching what it will look for few hours.

During last 24h my fan worked 50h. This is after changing config to: to end: "{{ now() }}"

Other entitiy show RATIO - 250% after change to" end: "{{ now() }}"

I will probably have to delete whole database and start from clean database.

bdraco commented 2 years ago

I'm starting to think this is a problem with the query cache as seen in #72991 and its not history_stats itself, it just happens to get bad data from the query

bdraco commented 2 years ago

Do you have any custom components installed that access the database?

bobofruit1970 commented 2 years ago

My custom elements

bobofruit1970 commented 2 years ago

Something strange is going on with my HA. It is not only history stats. Look at my processor temp When I change "Hours to show" from 3h to 4h and to 5h pictures changes like this.

probot-home-assistant[bot] commented 2 years ago

recorder documentation recorder source _{^{(message by IssueLinks)}}

probot-home-assistant[bot] commented 2 years ago

Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration (recorder) you are listed as a code owner for? Thanks! _{^{(message by CodeOwnersMention)}}

bdraco commented 2 years ago

Can you tell me more about the hardware you are running Home Assistant on. Which architecture are you using, and how many bits?

bobofruit1970 commented 2 years ago

Raspberry PI4 8GB, Raspbian 32bit in DeskPI PRO V2 case

bdraco commented 2 years ago

Thanks. 32bit RPI. Same hardware as #72991

I think we are getting closer

lmagyar commented 2 years ago

I experienced similar "spikes", but already in May, yes, with core 2022.5.x. Though this bug wasn't present in April (in my case). So maybe the below is false positive:

I've checked the database, and there is also an "unavailable" value for the sensor, so in my case, in May and June, spike is caused directly by an unavailable sensor data.
But there were no other sign, that the real sensor got unavailable, all other technical values these Sonoff/Tasmota-s reporting through MQTT has no signs of any outage. And in my experience in the past 2 years, if I have any WiFi outage, that ALWAYS last for minutes, NEVER only a single value.
And it occurred only a few times a day. Maybe only those values experienced the "outage" that were inserted into DB for plain sensory data recording AND were read for calculating statistical values at the same time.

TLDR: maybe the sqlalchemy thread safety bug can cause bad data ("unavailable") to be stored in DB for the real sensor, that can cause these spikes in calculated values, and this bug was present before core 2022.6.x. I repeat, maybe!

(I've created this screenshot yesterday, but I wasn't sure to post it.)

calisro commented 2 years ago

Maybe its a different issue but it looks similar. I have a few history_stats that look like this:

  - platform: history_stats
    name: Pool Alarm Time
    entity_id: binary_sensor.pool_swg_alarm_on
    state: 'on'
    type: time
    start: "{{ as_timestamp( now() ) - 7200 }}"
    end: "{{ now() }}"

After a restart, they spiked:

If I look at the underlying binary_sensor:

Teh data is clear except for the restart unknown state.

I can open a different issue but figured i'd bring it here because it seems very related. It looks to be calculating from the unknown and then not from the clear again. If I turn the pump on and it changes to alarm then back to clear it will begin to drop back down to its correct value.

It is definitely the restart that triggered the issue.

Same thing with my run time stats:

  - platform: history_stats
    name: Pool Pump Runtime Daily
    entity_id: switch.pentair_xxxx_pool
    state: 'on'
    type: time
    start: '{{ now().replace(hour=0).replace(minute=0).replace(second=0) }}'
    end: '{{ now() }}'

That is now reporting: 143.62 hours of rune time in one day today:

The underlying switch:

After ANOTHER restart, it came up and the sensors are back down. Now the daily sensor is reporting 0.33 hours of run time (yet its run for 5 hours plus) and the alarm sensor is clear.

The daily run time sensor is back down to

bdraco commented 2 years ago

@calisro

Once we sort out #72991 (the sqlalchemy team has been helpful and is looking at the bug now) we should have a better baseline to compare against.

I suspect https://github.com/home-assistant/core/pull/73110 will improve the situation as well, but thats been a long standing issue with history_stats so I'm not sure its 100% related here.

Its hard to tell the root cause until #72991 is solved

calisro commented 2 years ago

Understand. I'll watch those issues and report back or open a new issue when they clear.

EDIT; After more restarts, all the sensors are right. It sure does look like a caching/concurrency issue like reported in 72991 to me at least on the surface. There's no other reason I can see where the restarts would have actually brought the sensors back to the correct history.

I also find it interesting that the daily sensor here has the exact same value as the monthly sensor when the issue occured on different stats.

bdraco commented 2 years ago

I also find it interesting that the daily sensor here has the exact same value as the monthly sensor when the issue occured on different stats.

That makes sense since the bindparms are getting swapped based on the behavior discussed here https://github.com/sqlalchemy/sqlalchemy/issues/8098

enjoysimpson commented 2 years ago

Error seems to be solved for me. Great work, thanks for fixing it quickly!

calisro commented 2 years ago

Me too hopefully. History stats will still show wrong data for the past issues till it ages out.

bobofruit1970 commented 2 years ago

Yes, it looks like it's working again. Thank you. It is a pity that there is no mechanism for deleting or setting the values of selected entities between the selected dates in HA. Then the invalid data could easily be removed from the database.

home-assistant / core