home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.43k stars 30.67k forks source link

Integral integration: big jump in calculated data when starting HomeAssistant after a long (hours) delay #126006

Open pohly opened 1 month ago

pohly commented 1 month ago

The problem

I am collecting the current solar production as reported by our installation through a RESTful integration. That gives me power in Watt. I defined a helper which uses the integral integration with Rieman left sum as method to calculate energy in Wh.

Today, I had to turn off the computer on which HomeAssistant runs for three hours. Once I turned it on again, the energy value jumped by more than 17 MWh instead of the expected 3h*2195W (= 6585Wh). Here’s a screenshot of the two values:

solar-energy

The input value (power) has no values while offline. The graph shows that as a horizontal line. There’s a small jump at the right edge from the old 2195 W from around 11:10 to the new new 2450 W at 12:55.

My HomeAssistant was old when that happened. I updated to the latest HomeAssistant and was able to reproduce the problem by copying my installation's data and starting elsewhere with it after 10 hours.

What version of Home Assistant Core has the issue?

2024.9.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

Integral

Link to integration documentation on our website

https://www.home-assistant.io/integrations/integration

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

home-assistant[bot] commented 1 month ago

Hey there @dgomes, mind taking a look at this issue as it has been labeled with an integration (integration) you are listed as a code owner for? Thanks!

Code owner commands Code owners of `integration` can trigger bot actions by commenting: - `@home-assistant close` Closes the issue. - `@home-assistant rename Awesome new title` Renames the issue. - `@home-assistant reopen` Reopen the issue. - `@home-assistant unassign integration` Removes the current integration label and assignees on the issue, add the integration domain after the command. - `@home-assistant add-label needs-more-information` Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue. - `@home-assistant remove-label needs-more-information` Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


integration documentation integration source (message by IssueLinks)

pohly commented 1 month ago

This is not specific to this particular sensor. All of my Integral sensors seem to have this problem.

dgomes commented 1 month ago

By the default the Integration sensor updates upon source sensor state change. Since your system was off the integration sensor will multiply the last state value (because you use left method) by the time the system was off.

So no bug... it's working as it is supposed.

You can update the sensor more ofter using https://www.home-assistant.io/integrations/integration#max-sub-interval but since your system is down for soo long this will not help. The source should be made unavailable before shutdown to avoid this issue, but that is out os scope of this issue.

pohly commented 1 month ago

will multiply the last state value (because you use left method) by the time the system was off

That should have added 3h* 2195W = 6585Wh, not 17 MWh.

Can you please reopen? Something was not working as intended.

dgomes commented 1 month ago

The integration sensor is very limited in the sense it only multiplies the values it gets from the source and takes into consideration the time each state occurs.

Did you by any chance change the clock of your HA host ?

pohly commented 1 month ago

No, time should have remained sane.

I'm trying to reproduce with more debug output added. But of course, now that I am looking it didn't happen :cry:

I'm going to try again with similar starting conditions (power value, duration of the pause).

dgomes commented 1 month ago

Please have debug of integration enabled so we get the extra logging information.

In my experience 90% of the issues are about the source (published a wrong value)

pohly commented 1 month ago

From the forum:

It just occurred to me… during the downtime, did you restore a previous backup ? The last state change is not kept in the recorder, it’s kept in a special file in .storage

No, I really only shut down the computer, then powered it on again 3 hours later. My current experiment is with starting and stopping the container, which should be similar.

pohly commented 1 month ago

Darn these Heisenbugs. I tried restarting twice today with more log output, but it worked okay in both cases.

While I keep trying, a question: should the integrate_on_max_sub_interval_exceeded_callback code get triggered during a restart?

https://github.com/home-assistant/core/blob/bcbf810cbe4db997a9bde5ebee8c99fcf7aed66d/homeassistant/components/integration/sensor.py#L588-L591

pohly commented 1 month ago

Or phrased differently: directly after a restart, should it update the area once with the time since the last update (3 hours when I first encountered this), or multiple times with small updates?

Or not at all, i.e. only update with the usual time delta when it has two new source values?

dgomes commented 1 month ago

it should update with the current state and last state previous to the restart (therefore my previous answer in this thread)

pohly commented 1 month ago

it should update with the current state and last state previous to the restart

What I see is that it updates with a very small time delta, not the time delta since the last update before the restart.

Here's the code that I added to _integrate_on_state_change to figure how it computes the area:

        elapsed_seconds = Decimal(
            (new_state.last_reported - old_last_reported).total_seconds()
            if self._last_integration_trigger == _IntegrationTrigger.StateEvent
            else (new_state.last_reported - self._last_integration_time).total_seconds()
        )

        area = self._method.calculate_area_with_two_states(elapsed_seconds, *states)
        if self._source_entity == "sensor.solar_power":
            _LOGGER.warning(f"*** integral on_state_change: area={area}, elapsed={elapsed_seconds}, states={states}, trigger={self._last_integration_trigger}, new_state.last_reported={new_state.last_reported}, old_last_reported={old_last_reported}, last_integration_time={self._last_integration_time}, sensor={self}")

I had my container stopped since yesterday. The last update yesterday was:

2024-09-16 17:45:11.963 WARNING (MainThread) [homeassistant.components.integration.sensor] *** integral on_state_change: area=6949.184069999999531219714299, elapsed=9.9988259999999993254959917976520955562591552734375, states=(Decimal('695'), Decimal('695')), trigger=_IntegrationTrigger.StateEvent, new_state.last_reported=2024-09-16 15:45:11.961168+00:00, old_last_reported=2024-09-16 15:45:01.962342+00:00, last_integration_time=2024-09-16 15:44:02.395694+00:00, sensor=<entity sensor.solar_energy=27698402.76>

The elapsed time is ~10, my sampling interval for the solar power.

Now after restarting:

2024-09-17 13:44:01.302 WARNING (MainThread) [homeassistant.components.integration.sensor] *** integral on_state_change: area=0.7325880000000000084696676755, elapsed=0.00016400000000000000189605275924265015419223345816135406494140625, states=(Decimal('4467'), Decimal('4467')), trigger=_IntegrationTrigger.StateEvent, new_state.last_reported=2024-09-17 11:44:01.301827+00:00, old_last_reported=2024-09-17 11:44:01.301663+00:00, last_integration_time=2024-09-17 11:43:51.792100+00:00, sensor=<entity sensor.solar_energy=27698404.69>
2024-09-17 13:44:01.302 WARNING (MainThread) [homeassistant.components.integration.sensor] *** integral on_state_change: area=2.023551000000000028562850002, elapsed=0.0004530000000000000063941907324505109500023536384105682373046875, states=(Decimal('4467'), Decimal('4467')), trigger=_IntegrationTrigger.StateEvent, new_state.last_reported=2024-09-17 11:44:01.302280+00:00, old_last_reported=2024-09-17 11:44:01.301827+00:00, last_integration_time=2024-09-17 11:43:51.792100+00:00, sensor=<entity sensor.solar_energy=27698404.69>
2024-09-17 13:44:01.304 WARNING (MainThread) [homeassistant.components.integration.sensor] *** integral on_state_change: area=43298.08541100000187284990716, elapsed=9.666909000000000418140189140103757381439208984375, states=(Decimal('4479'), Decimal('4479')), trigger=_IntegrationTrigger.StateEvent, new_state.last_reported=2024-09-17 11:44:01.302280+00:00, old_last_reported=2024-09-17 11:43:51.635371+00:00, last_integration_time=2024-09-17 11:43:51.792100+00:00, sensor=<entity sensor.solar_energy=27698404.69>
...

The first two updates are odd because the delta is small. The trigger is always StateEvent, so the elapsed time is the delta between the new event time and old_last_reported, which is from the "old event", which isn't so old here: it's from after the restart.

The third update is odd because "new_state.last_reported" is the same as before (11:44:01.302280) but now the old event is actually from 10 seconds ago. That means that some area from right before that time stamp gets added more than once.

dgomes commented 1 month ago

The first 2 means that the source sensor.solar_energy published the value twice... (since the value did not change, it probably update an attribute)

The third might be related to race/delay of the restore of the sensor. Try adding more debug at https://github.com/home-assistant/core/blob/622e9aa3dc94b11c7c0f72bcd10fd26bf0aaa0cd/homeassistant/components/integration/sensor.py#L423

pohly commented 1 month ago

The first 2 means that the source sensor.solar_energy published the value twice... (since the value did not change, it probably update an attribute)

So that there was no area update for the time when HomeAssistant was offline is normal?

The third might be related to race/delay of the restore of the sensor. Try adding more debug

Will do.

pohly commented 1 month ago

More log output, now including async_added_to_hass:

2024-09-19 14:10:12.318 WARNING (MainThread) [homeassistant.components.integration.sensor] *** Restored state 27698551.73 and last_valid_state 27698551.73033163888888801574 sensor=<entity sensor.solar_energy=27698551.73>
2024-09-19 14:10:21.679 WARNING (MainThread) [homeassistant.components.integration.sensor] *** integral on_state_change: big area=0.2342400000000000059342808445, elapsed=0.000120000000000000003040102891649354432956897653639316558837890625, states=(Decimal('1952'), Decimal('1952')), trigger=_IntegrationTrigger.StateEvent, new_state.last_reported=2024-09-19 12:10:21.679702+00:00, old_last_reported=2024-09-19 12:10:21.679582+00:00, last_integration_time=2024-09-19 12:10:12.318236+00:00, sensor=<entity sensor.solar_energy=27698551.73>
2024-09-19 14:10:21.680 WARNING (MainThread) [homeassistant.components.integration.sensor] *** integral on_state_change: big area=0.6988159999999999493630609138, elapsed=0.0003579999999999999740589451402428267101640813052654266357421875, states=(Decimal('1952'), Decimal('1952')), trigger=_IntegrationTrigger.StateEvent, new_state.last_reported=2024-09-19 12:10:21.680060+00:00, old_last_reported=2024-09-19 12:10:21.679702+00:00, last_integration_time=2024-09-19 12:10:12.318236+00:00, sensor=<entity sensor.solar_energy=27698551.73>
2024-09-19 14:10:21.681 WARNING (MainThread) [homeassistant.components.integration.sensor] *** integral on_state_change: big area=27936.54428399999950549670302, elapsed=9.5119319999999998316297933342866599559783935546875, states=(Decimal('2937'), Decimal('2937')), trigger=_IntegrationTrigger.StateEvent, new_state.last_reported=2024-09-19 12:10:21.680060+00:00, old_last_reported=2024-09-19 12:10:12.168128+00:00, last_integration_time=2024-09-19 12:10:12.318236+00:00, sensor=<entity sensor.solar_energy=27698551.73>

new_state.last_reported=2024-09-19 12:10:21.680060 occurs twice again.

If there is a race here, then that might explain why I was able to reproduce the issue only once and then not again.

What is racing with what? Starting up the integration and producing new input sensor values?

dgomes commented 1 month ago

From that log I would rule out a race from "Restored state"

My guess is the source sensor is at fault... somehow publishing twice

pohly commented 1 month ago

I got the sqlite database cleaned up through a Python script. Log output shows that the new area is sane. But somehow all new sensor.solar_energy are still to high.

Isn't there a mechanism in HomeAssistant that prevents a state from going down? Could that kick in here?

pohly commented 1 month ago

Found it. It's the old state in .storage/core.restore_state which serves as base for the first update after a restart, not what is found in the sqlite database. My cleanup script needs to fix that state, too.

But this makes me wonder: how often is that .storage/core.restore_state JSON file being updated? Only on a clean shutdown or periodically in case of a crash? Writing a complete file safely such that there is no data loss is hard. sqlite goes to great pain to avoid that.

How well does the integration code handle the (probably not uncommon) scenario that the restore_state is consistent, but older than the content of the sqlite database?