home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.61k stars 30.37k forks source link

Problem with statistics sensors in core-2021.12.0 #61624

Closed miguelpucela closed 2 years ago

miguelpucela commented 2 years ago

The problem

I've just updated to core-2021.12.0 and have found the statistics sensors have less attributes than before.

I have the next sensor:

sensor:
  - platform: statistics
    name: stats_temperature_salon
    entity_id: sensor.sonoff_a48000f31d_temperature
    max_age:
      hours: 24
    sampling_size: 50

In core-2021.11.5 it had the next attributes:

sampling_size: 50
count: 26
mean: 22.6
median: 22.7
quantiles:
  - 22.04
  - 22.7
  - 23.12
standard_deviation: 0.57
variance: 0.32
total: 587.67
min_value: 21.74
max_value: 23.39
min_age: '2021-12-11T13:01:09.658738+00:00'
max_age: '2021-12-12T12:09:57.390473+00:00'
change: -0.38
average_change: -0.02
change_rate: -0.0
unit_of_measurement: °C
friendly_name: stats_temperature_salon
icon: mdi:calculator

while in core-2021.12.0, the attributes are:

state_class: measurement
age_coverage_ratio: 0.97
buffer_usage_ratio: 0.54
source_value_valid: true
unit_of_measurement: °C
icon: mdi:calculator
friendly_name: stats_temperature_salon

I think it's a bug.

Thank you.

What version of Home Assistant Core has the issue?

core-2021.12.0

What was the last working version of Home Assistant Core?

core-2021.11.5

What type of installation are you running?

Home Assistant OS

Integration causing the issue

statistics sensor

Link to integration documentation on our website

https://www.home-assistant.io/integrations/statistics/

Example YAML snippet

sensor:
  - platform: statistics
    name: stats_temperature_salon
    entity_id: sensor.sonoff_a48000f31d_temperature
    max_age:
      hours: 24
    sampling_size: 50

Anything in the logs that might be useful for us?

No response

Additional information

No response

probot-home-assistant[bot] commented 2 years ago

Hey there @fabaff, mind taking a look at this issue as it has been labeled with an integration (statistics) you are listed as a code owner for? Thanks! (message by CodeOwnersMention)


statistics documentation statistics source (message by IssueLinks)

mib1185 commented 2 years ago

Seems to be related to #59867 which is a breaking change, but unfortunately not noted on the release notes

newlund commented 2 years ago

Thank you @mib1185 Got my statistics sensors working again :)

THE-AP05TLE commented 2 years ago

Thank you @mib1185 Got my statistics sensors working again :)

would you be willing to share the working config?

newlund commented 2 years ago

The values are no longer available as attributes

Example

Will give you an entity named sensor.1w_power_stats_mean with the mean value as state.

THE-AP05TLE commented 2 years ago

Thanks @newlund - The mean sensor works fine, but I'm getting "unknown" for change sensors. Just wondered whether there is a bug

newlund commented 2 years ago

Sorry, don't know. Not using change as state_characteristic

ThomDietrich commented 2 years ago

Hey all, the breaking change is now added to release notes: https://github.com/home-assistant/home-assistant.io/pull/20666/files Sorry for the confusion. The example given by @newlund is correct, but really, just check the documentation for all relevant details https://www.home-assistant.io/integrations/statistics/

The mean sensor works fine, but I'm getting "unknown" for change sensors. Just wondered whether there is a bug

@THE-AP05TLE I'm not sure I understand. Please post your config and what you would expect. All characteristics are systematically tested.

ThomDietrich commented 2 years ago

Also, please pay attention to

Many characteristics were renamed and additional ones were added, including a few for binary source sensors. Consider opening a feature request if your use case is not directly covered.

I'm interested in which use cases you guys have and how the statistics component can be improved. Working on next features for the next release already.

miguelpucela commented 2 years ago

I'm interested in which use cases you guys have and how the statistics component can be improved. Working on next features for the next release already.

In my case, I use the mean of emperature along the last 24 hours to create a binary_sensor to compare that mean with a threshold. I then use a scheduling to switch on an electrical heating system at certain hours, but only if the binary_sensor is off (the mean temperature along last 24 hours is below a threshold).

sensor:
  - platform: statistics
    name: "Average temperature hall"
    entity_id: sensor.sonoff_a48000c3_temperature
    state_characteristic: mean
    max_age:
      hours: 24
    sampling_size: 50

binary_sensor:
  - platform: template
    sensors:
      average_hall_temperature_above_threshlold:
        friendly_name: "Average hall temperature > rooms thresold"
        value_template: "{{states('sensor.average_temperature_hall') | float(default=0) > states('input_number.rooms_threshold_temperature') | float(default=0)}}"
ThomDietrich commented 2 years ago

Understood. A few questions.

THE-AP05TLE commented 2 years ago

Thanks @ThomDietrich Initially I copied the example in the docs `sensor:

Initially it reports "0.0", but then "unknown" after the max_age is reached. It has occurred to me it could be due to the fact that it only takes readings periodically (xiaomi zigbee sensor), but I'll need to play with the settings to test this

THE-AP05TLE commented 2 years ago

To answer the question for use case

I use change sensors to rapidly detect humidity changes (from the shower) to turn on the extractor fan. I also use a binary sensor for a normal humidity level for legacy reasons, but a numeric state trigger might be better.

I also use mean statistic sensors to smooth other readings e.g. a light sensor (to avoid blips when the light was briefly switched on) or to smooth power measurements on appliances

ThomDietrich commented 2 years ago

@THE-AP05TLE @newlund just fyi, this is how you insert code:

```yaml
sensor:
  - platform: statistics
    name: "Average temperature hall"


Regarding your issue,

> It reports "0.0", but then "unknown" after the max_age is reached.

That sounds like the intended behavior. Make sure your sensor reports measurements often enough. You probably run empty after those 5 minutes. Also check the attributes of the statistics sensor to know what's up https://www.home-assistant.io/integrations/statistics/#attributes
ThomDietrich commented 2 years ago

I have the exact same use case by the way. Consider the following:

This setup is pretty well supported by the new statistics sensor version.

Happy to share my config if you are interested.

THE-AP05TLE commented 2 years ago

@ThomDietrich Thanks

That sounds like the intended behavior.

Thanks - having just tested when humidity is changing, it does indeed work (I just wasn't expecting an "unknown"). The humidity change within a time frame has been working great until now (having a fixed threshold e.g. 90% is less flexible and can lead to unwanted fan activation if too low)

ThomDietrich commented 2 years ago

mean statistic sensors to smooth other readings

That is interesting. For this kind of use cases you normally also want an outlier filter applied. That might be something to be added in a future release.

THE-AP05TLE commented 2 years ago

@ThomDietrich if you have a fancy single automation (e.g. with choose options), please share. I probably have an ugly 3 different automations to do this

ThomDietrich commented 2 years ago

Sure, here you go.

Click to expand ```yaml automation: - id: "1611580248790" alias: Regel Bad Lüftung ein bei erreichen hoher Luftfeuchte trigger: - platform: numeric_state entity_id: sensor.bad_raumsensor_bme280_humidity above: 65 - platform: numeric_state entity_id: sensor.bad_raumsensor_bme280_humidity above: 75 - platform: numeric_state entity_id: sensor.bad_raumsensor_bme280_humidity above: 90 - platform: time at: "04:00:00" action: - service: switch.turn_on entity_id: switch.bad_luftung - id: "1611580248812" alias: Regel Bad Lüftung aus bei gleichbleibender Luftfeuchte trigger: - platform: state entity_id: sensor.bad_luftfeuchte_statistics_last5min condition: condition: and conditions: - condition: state entity_id: switch.bad_luftung state: "on" for: minutes: 5 - condition: numeric_state entity_id: sensor.bad_raumsensor_bme280_humidity below: 60 - condition: numeric_state entity_id: sensor.bad_luftfeuchte_statistics_last5min attribute: change above: -2 action: - service: switch.turn_off entity_id: switch.bad_luftung - id: "1611580248792" alias: Regel Bad Lüftung Maximallaufzeit trigger: - platform: state entity_id: switch.bad_luftung to: "on" for: minutes: 60 action: - service: switch.turn_off entity_id: switch.bad_luftung - service: notify.persistent_notification data: title: "Badlüftung" message: "Ausgeschlatet nach Maximallaufzeit. Wieso hat die Stats Regel nicht früher gegriffen?" ```

Beware that I did not yet update the statistics sensor for this HA installation. You get the idea Most of the redundant code you can see is intended to cover a certain hysteresis, meaning that the fan should turn on as often as necessary, run for at least 5min and turn off eventually

THE-AP05TLE commented 2 years ago

Sure, here you go.

Thanks, the idea is pretty similar to what I have (although it is always interesting to see how others have solved the same problem). Thanks for all your help with this.

miguelpucela commented 2 years ago

Thanks for your comments.

Understood. A few questions.

  • Is 50 samples really enough to capture 24h hours from your tasmota sensor? The default "TelePeriod" is 15min if I remember correctly.

I use the [Sonoff LAN integration(https://github.com/AlexxIT/SonoffLAN) for the temperature sensors. I remember that when I configured it, some sensors, whose temperature variation was very big, the number of updates was even higher. In those sensors (i.e. in a windowed gallery) I have 100 samples per day.

You are right. The only reason is that I display the binary sensors in a Control Panel in Lovelace.

  • Your max_age is 24 hours. Is the sliding window of 24h exactly what you need or would it be useful to bind the value collection to a certain time, e.g. "the mean temperature since 6:00 am"?

I use a sliding window to take into account the evolution of the temperature, not only the actual temperature (as a very simple low-pass filter). In any case, I have to check it because 24 hours period sliding window may be too long, and the system has very high inertia and does not adapt to changes quickly. For instance, when temperatures go down quickly due to a cold and cloudy day, but last day was sunny, or otherwise. I'll check with 12 or 6 hours sliding windows.

miguelpucela commented 2 years ago

I'm interested in which use cases you guys have and how the statistics component can be improved. Working on next features for the next release already.

It would be interesting to obtain a "weighted arithmetic mean" of sensor data. For instance it can be useful if you want to get an average value, but you want to give more importance to more recent values than older ones.

ThomDietrich commented 2 years ago

@miguelpucela would be easy to implement something like this: https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average The method has some interesting benefits over a classical moving average, that's certain. Do you have a specific use case for it in mind? Also, would we reduce the weights equally from sample to sample or depending on the value duration in case of infrequently sensor readings!?

Best!

miguelpucela commented 2 years ago

@miguelpucela would be easy to implement something like this: https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average The method has some interesting benefits over a classical moving average, that's certain. Do you have a specific use case for it in mind? Also, would we reduce the weights equally from sample to sample or depending on the value duration in case of infrequently sensor readings!?

Best!

I think you have read my mind.

Here it is what I wrote this afternoon onto Home Assistant Community.

https://community.home-assistant.io/t/getting-last-n-values-of-a-sensor-to-create-a-weighted-mean-sensor/367699

There I comment the use case I've thought and in the last post the idea of geometric series weighting that is the same as exponential moving average you suggest. I thought it because it has the very important advantage that you don't have to save a sliding buffer of values, but only the actual weighted sum, as the new sum is obtained from last value. And what's more, you don't have to decide the size of the sum, as it can be updated forever. The only parameter is the ratio or base of the potential factors.

I think it's not necessary to take into account the samples duration in the simplest approach, but of course, it would be an improvement.

pterodaktil02 commented 2 years ago

Before i use statistical attributes to understand can i use statistical data I have intermittent sensors (used only in winter). And in statistical sensors I use data from the last 24 hours. And while the dataset is small, I cannot use the statistical sensor. I used to use 1 sensor and its attributes. Now, instead of 1 sensor, you need to do 3. Please return the attributes - counts, average, median.

I think good idia is set "state_characteristic" as array. Where 1-st variable is state. and others are attributes

ThomDietrich commented 2 years ago

Hey @pterodaktil02, thanks for your feedback! First of all I want to say that count is still available, however in a normalized (arguably better) way. See here https://www.home-assistant.io/integrations/statistics/#attributes for age_coverage_ratio.

The attributes will not come back. That was agreed with the home assistant core developers driven by a longer term strategy to not put direct sensors into attributes. That makes sense form multiple perspectives. I think a future step could be that one statistics integration configuration could generate multiple sensors - however I believe that should not be the goal, see 2 below.

To address your use case:

  1. There is nothing that really hinders you from defining two or three sensors instead of one. This step should also give you better control and visibility.
  2. I was asking for use cases. The reason for this is that I believe we should rather define further characteristics that directly serve a use case, instead of being forced to access multiple characteristics. See the example from @miguelpucela above. In your example the need for count is fulfilled by the attribute age_coverage_ratio and I did not understand why you need average and median. Those two barely differ in reality, hence I wonder if your logic can actually be improved to just need one of them, or maybe any of the newly added characteristics.
pterodaktil02 commented 2 years ago

I can setup sensors for all data For example i use statistical sensors to calculate optimal data. and made this sensor to find optimal temperature for ventilation system. In old case i need only 1 statistical sensor in new one - i need 2 sensors

- platform: statistics
  name: set_temp_stable_statistic
  entity_id: sensor.temp_stable_raw
  sampling_size: 2000
  max_age:
    hours: 168

- platform: template     
  sensors:      
    set_temp_stable:
      unit_of_measurement: '°C'
      value_template: >-
           {% if is_state('sensor. set_temp_stable_statistic', 'unknown') %}
             17
           {% elif (state_attr('sensor.set_temp_stable_statistic', 'count')|float) < 20 %}              
             17
           {% else %}
             {{ ((state_attr('sensor.set_temp_stable_statistic', 'median')|float)) | round (1) }}
           {% endif %}         

sensor sensor.temp_stable_raw is very unstable it can recive 20 readings in 10 minutes and after that it can wait data for few hours.

I have template sensor with sample if

            ` {% if (states('sensor.temperatura_pritochnogo_vozdukha')|float) < 10 and (state_attr('sensor.pritok_staticsic', 'count')|float) > 200 and (state_attr('sensor.pritok_staticsic', 'standard_deviation')|float) < 10  %}`

Please let me select how to setup my sensors.
Now i returned 2021.11.5 and stop upgrade. It's not good idia but add trash sensors seems more uncomfortable

ThomDietrich commented 2 years ago

Believe me, I am sorry this change broke your existing logic. It did brake mine too! However sometimes changes are needed to improve on the bigger picture. Home Assistant is constantly growing and maturing. The integration had some shortcomings and dozens of sensor values as attributes was just not in line with what attributes should be used for in the greater scheme of things.

I see your use case and understand your example. You want to capture two statistical values, so define two statistical sensors. It's as simple and doesn't hurt

- platform: statistics
  name: set_temp_stable_statistic_median
  entity_id: sensor.temp_stable_raw
  state_characteristic: median
  sampling_size: 2000
  max_age:
    hours: 168
- platform: statistics
  name: set_temp_stable_statistic_stddev
  entity_id: sensor.temp_stable_raw
  state_characteristic: standard_deviation
  sampling_size: 2000
  max_age:
    hours: 168

sensor sensor.temp_stable_raw is very unstable it can recive 20 readings in 10 minutes and after that it can wait data for few hours.

You should certainly check out average_linear and look out for the new characteristic @miguelpucela proposed. Also noisiness might be for you...

ThomDietrich commented 2 years ago

@miguelpucela

I think you have read my mind.

Nice! :) Also, great highlight of the implicit nature of the geometric series weighing. If one wanted to solve the use case via an automation or similar, that would certainly make it a lot easier. In our case the statistics integration already stores all values in memory, so no need to restrict ourselves.

I believe you know your math, so let's define the method together. I would suggest to work with an exponential weights development function, with the age of individual measurements as the input. I found that most use cases (like the one by @pterodaktil02) are indeed quite heavily affected by aperiodic sensor updates (short bursty spikes generate the maturity of values, therefore skewing the statistical values). Any thought on method or parameterization? Any thought on whether we need to support multiple weight functions, e.g. an exponential and a triangular weighted moving average?

miguelpucela commented 2 years ago

I think exponential and linear decreasing weights cover almost all use cases (can you think of an use case for triangular moving average?). All time-related sensors will be causal systems, so I don't think a weighting function which gives more importance to a past sample than the last one, like triangular, has practical utility. If input signal has spatial instead of temporal nature, maybe.

In the linear weighting, the parameter needed is the slope of the line. I think that the best and easier for the end user way to define it is with the weight of the oldest sample, with 0 as default.

In the exponential weihting, the parameter needed is of course the base of the exponential. If you put a default value, it should be about 0,75. I've tested the exponencial (really a simple geometric weigths without taking into accound the time spacing), and for values below 0.5, the output is very similar to input. For higher values, the output is a softener and delayed version of input (softer with higher values). For values higher than 0.95, the output is a slow varying trend of the input.

For reference:

miguelpucela commented 2 years ago

If you want to define more weighting functions, I think you can think of weighting functions similar to the sigmoid function used in neural networks: https://mathworld.wolfram.com/SigmoidFunction.html

pterodaktil02 commented 2 years ago

@ThomDietrich I have more then 500 sensors in HA - and it's impossible control them Most of them created with devices and not used at all (tremperature sensor in switch and so on), some used to prepair raw data for other sensors. And now i want minimize number of sensors, and dont want add new. For example i want merge filter with template: template sensor calculate raw data and after it i must filter it.

I dont understand why you dont want make state_characteristic as array. Somebody want make many sensors - he can make it. If i need all-in-one sensor -i can make it too.

ThomDietrich commented 2 years ago

@pterodaktil02 I don't know what more I can tell you. The Home Assistant core developers realized that sensor readings exposed through attributes create a bunch of issues. Entities have a full state, including availability or a unit of measurement. Attributes must be used for secondary data, like quality or source of data. As a result of this, I was asked to make the change and I was careful so that everything you described so far is easily done through individual sensors. Your suggested array introduces all the old issues and will not be accepted as a pull request. If you have a suggestion for a new characteristic that would reduce the number of needed sensors, let's chat about that.

I have more then 500 sensors in HA - and it's impossible control them

Are you talking about entities? That's a good number but many users have far more than that. The key is good configuration management, structures, naming and documentation.

pterodaktil02 commented 2 years ago

It's sad that the vision of the development direction of maintainers is different from mine. I want to minimize the number of entities. Maintainers want to inflate their number.

ThomDietrich commented 2 years ago

@miguelpucela @pterodaktil02 I've just opened a bunch of pull request for new (non-breaking) additions to the integration. Feel free to have a look! https://github.com/home-assistant/core/issues?q=is%3Aopen+statistics+label%3A%22integration%3A+statistics%22+

@miguelpucela I did not yet implement the weighted average function.

@pterodaktil02 I'm adding more and more characteristics. The goal certainly is to not need more than one statistics sensor per use case.

miguelpucela commented 2 years ago

@miguelpucela @pterodaktil02 I've just opened a bunch of pull request for new (non-breaking) additions to the integration. Feel free to have a look! https://github.com/home-assistant/core/issues?q=is%3Aopen+statistics+label%3A%22integration%3A+statistics%22+

Great!

@miguelpucela I did not yet implement the weighted average function.

Maybe useful for you if you want to take a look at this post: https://community.home-assistant.io/t/maintain-an-array-with-last-values-of-a-sensor/368806/6?u=miguelpucela. I've made an estimation of future average based on previous day's data. There's a linear weight average with a configurable window size. I'ts very basic, but just in case ...

@pterodaktil02 I'm adding more and more characteristics. The goal certainly is to not need more than one statistics sensor per use case.

pterodaktil02 commented 2 years ago

@ThomDietrich Great news... I think best case is to configure which attributes user need in statistic sensor.

rschuiling commented 2 years ago

For another use case: I have a wind sensor used for retracting my awning. I don't want it to retract on any gust of wind. So I would like to use the average of the last three measurements. This wind sensor is updated every 12 seconds. My problem is that the statistics sensor never turns to 0. In previous versions with the 'average' it turned to Unknown after a series of zeroes. In this version, it just seems to ignore the zero values (using mean & average_timeless).

So my use case is: after a series of 0, I would like a statistics sensor that actually turns to 0. How would I go about that? The post I made about this previously is here on the forum.

github-actions[bot] commented 2 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.