Huge amount of state changes generated causing HA recorder shutdown and loss of history

nilrog commented 1 year ago

I noticed recently that my history, for all sensors, stopped working in HA. And after some time I found an error in the HA log saying that the recorder was overloaded and was stopping recording. This happened a few weeks after upgrading HA to 2022.2.5, which highly recommended that MariaDB was upgraded, so I performed that upgrade first. After those two upgrades HA was running fine...no issues at all. Then I noticed that this integration had also been updated, and since one of the updates was for HA compatibility I upgraded this integration. Still, everything looked fine.

When looking into this one of the tips was to check the HA tables in MariaDB, so I did that, and discovered that the states table was huge (> 3GB). Checking for the entities that produced the most states, this integration was occupying 9/10 of the top 10 entries. Not even my solar inverter, that I am polling every 10s is producing this huge amount of state entries. It's highest entity has ~126000 state changes.

+---------------------------------------------------------------------------------+--------+
| entity_id                                                                       | cnt    |
+---------------------------------------------------------------------------------+--------+
| sensor.go_echarger_012345_nrg                                                   | 531448 |
| sensor.go_echarger_012345_nrg_2                                                 | 530689 |
| sensor.go_echarger_012345_nrg_3                                                 | 522030 |
| sensor.power_18                                                                 | 465501 |
| sensor.go_echarger_012345_wh                                                    | 169717 |
| sensor.go_echarger_012345_eto                                                   | 157582 |
| sensor.go_echarger_012345_nrg_16                                                | 146311 |
| sensor.go_echarger_012345_tpa                                                   | 137936 |
| sensor.go_echarger_012345_tma                                                   | 136665 |
| sensor.go_echarger_012345_nrg_12                                                | 130342 |

Using this search: SELECT entity_id, COUNT(*) AS cnt FROM states GROUP BY entity_id ORDER BY COUNT(*) DESC;

I have since excluded the following sensors from being recorded, and thus loosing their history, and now HA is behaving better. The size of the states table is now <3GB and will hopefully continue to shrink as HA is purging old history.

I do not know what triggered this integration to post them so frequently. If it is due to changes in the go-echarger, that is now running on 0.55.0, or if it is changes to this integration (have not had time to look into how this integration works). One or more of the updates that I did triggered this bad behavior. Because I have been running with this integration for almost a year and never had any problems with HA, its history, or this integration. Since this integration is now managed by the UI, and not yaml, I do not see any possibility to control scan_interval, or even if this integration supports changing that. Because that is one way you can configure how often an integration should update.

I think you need to look into this, and if not possible to do anything in the code to prevent this huge amount of events, you should add some info to the readme to indicate that this integration can/will cause a lot of events being generated and that can have a serious effect on HA. I think I am safe now...will be monitoring my HA for a few weeks to see if I get problems again.

nilrog commented 1 year ago

I don't have the HA error log left, it has been rotated away. But this is the same error that I finally got in HA (although the limit according to HA was >60000), long after I noticed that I did not have any history in HA

https://community.home-assistant.io/t/error-the-recorder-queue-reached-the-maximum-size-of-30000-events-are-no-longer-being-recorded/408359

syssi commented 1 year ago

Sorry for the inconveniences. I'm sure we will find a good solution.You could stop recording the entity states of the charger in the meantime.

nilrog commented 1 year ago

Yes, I think I have "put out the fire" for now, at least. Three of the biggest offenders was the nrg, nrg_2 and nrg_3 sensors that are reporting the phase voltages...and those I do not care about since I have that monitored by my solar inverter. So they are now excluded from the recorder.

Don't know if it is possible, with an integration that is setup in the UI, to exclude sensors. I know I can do that with the solar inverter integration I am using (that I have contributed code to). But I see no mention of that in the documentation for this integration. Although the fact that the table of sensors has a column "enabled by default" indicates, to me, that it should be possible to disable sensors. But maybe that column means something else?

syssi commented 1 year ago

Just some ideas / input from my side:

The charger is responsible for the update interval because the message gets pushed by the charger
The update_interval of the integration doesn't help here because the charger doesn't get polled
I will have a look at the API documentation of the charger to control the update interval
If the update interval of the charger cannot be changed I will implement a throttle setting to throw away most of the incoming messages / measurements.

syssi commented 1 year ago

Don't know if it is possible, with an integration that is setup in the UI, to exclude sensors. I know I can do that with the solar inverter integration I am using (that I have contributed code to). But I see no mention of that in the documentation for this integration. Although the fact that the table of sensors has a column "enabled by default" indicates, to me, that it should be possible to disable sensors. But maybe that column means something else?

Go to the entities you don't need and open the "advanced settings" menu. The entities can be disabled here. Some of the go-eCharger entities are disabled per default and must be enabled here if needed.

2023-03-17_12-19

nilrog commented 1 year ago

Thanks! I am (still) used to configure everything through yaml, so I have not learned all the features that are in the UI yet :) I have now disabled the L[123] voltage sensors, besides already discarding them from the recorder and influxdb.

The charger is pushing out updates as soon as something changes...and for voltage that can be a lot of updates since in many setups the voltage fluctuates and is not steady. I also notice that it is pushing out the temperatures very frequently. I don't know if this has changed recently, or if it has always pushed data in the same pace before. I never really bothered about this before since it has been working fine for a year with this integration. However, if you have one or more integrations that are also pushing updates frequently you might run into this issue that I did, where the recorder does not cope with all the data that is changing and stops working.

But since we all have different needs, having it flexible is also an advantage. So I think you can hold off making changes. But maybe write something in the README about that this can be an issue.

As for me, I am now down to ~2 GB in the states table, down from >3 GB last week, so it is decreasing with time as the recorder continuously purges old entries. So I can safely say that the voltage sensors updates was the root cause for my recorder to exhaust its resources.

syssi / homeassistant-goecharger-mqtt

Huge amount of state changes generated causing HA recorder shutdown and loss of history #86