Aggregation of datapoints

glitchracer commented 2 years ago

I'm running rctmon as an service on my raspberry and send the data to an influxdb on my NAS. So far it works good and I can see the values at my Grafana dashboard.

Currently I'm struggling with the amount of datapoints. From what I can see that rctmon is sending every 10s data to the db. If I want see data > 6h back it simply fails because I think there is to much data to load. Besides that reducing datapoints would help to save disk space in a long run.

How I can reduce or aggregate this datapoints at database level?

Thank you

MichaelMMS commented 2 years ago

Edit the device_manager file and search for "interval" There you can increase the interval (in seconds). So you will get a lower resolution and less datapoints.

glitchracer commented 2 years ago

There are a couple of 'interval' which a set to 0, which is the right one to reduce data points in general?

svalouch commented 1 year ago

(This is also related to #32)

So, the idea behind the different query intervals is actually the reduction of data, believe it or not :wink:

Values that require a higher granularity are updated more often than those expected to only slowly change over time. As an example: The battery cycles are a lot less important and far less likely to change than the current line frequency or the line load per phase. If the line load was to be queried at a slower interval, you wouldn't see short spikes by e.g. an electric stove turning on or off in your graphs. Some data is only queried once, mostly to discover what's connected (Battery, PowerSwitch, …); these aren't sent repeatedly, hence the interval setting.

The amount of data is already reduced this way, and any further reduction can be handled by the target systems: Both Prometheus and InfluxDB have mechanisms for that. Using these is the preferred way, as it relieves small projects like this from having to cater to a lot of different setups. For this, you should implement some Flux queries that take the raw data and either pushes it to "tables" where it makes sense, e.g. a "solar generator" table (possibly adding averaging or other functions), or drops points. Making all of the intervals configurable would greatly complicate the code and make setting things up a heck of a trip, so we won't do that. Raising the intervals in a release is also a no-go, as it would break use-cases that rely on finer-grained collection.

We could collect examples of how to reduce the data for different use-cases, so feel free to open PRs to add them to the documentation.

The raw stream has another use-case: Debugging. The inverters send all responses to all connected clients (for whatever reason), so the responses sent to smartphone apps is also recorded and can be used to further reverse-engineer the inverter.

That being said, the InfluxDB output could itself be improved to push values to "tables", e.g. a "solar generator" table for all the solar generator related metrics, at varying intervals via internal triggering. Some tables would be updated more often than others, depending on the required granularity. Still, data reduction would be handled on the InfluxDB-side, but it would make the raw data stream optional and it could be disabled in the config. The tables would likely form a ~1:1 relationship between Grafana panels and InfluxDB "tables". I had this planned initially for a PostgreSQL output that never saw the light of day.

The best way to do that would be to make DeviceManagers internal value store available to outputs for consumption; otherwise, they would have to re-build the data from the raw stream all over again. This would also benefit the proposed MQTT output in #32, as it wouldn't have to rely on parsing the Prometheus output (Prometheus' collect() is already a bit of a violation of best practices).

svalouch / rctmon

Aggregation of datapoints #19