New Version - Githubissues

Grillbert commented 3 years ago

Hi! I am looking forward to experiment with your new tools in order to add RCT Data to my influx DB (i am using IOTStack/Docker and i am planning to replicate the RCT Data into my DB in order to feed Grafana)

I wanted to do that in a near real-time manner (e.g.: every 10 minutes) From my first look into the Doku i saw that copying the time-series into an Influx DB is only possible for bigger chunks of data... I guess due to the nature of the interface which is designed to feed the App that would mean to read a lot of data double, correct?

Regards, Gilbert

svalouch commented 3 years ago

Hi, thanks for your interest!

The current version of timeseries2csv.py allows to limit the amount somewhat via command line options using --count, but you'd need to change the script to achieve lower time ranges than an hour. It's really meant for archiving larger amounts of data, and with the resolution of the logger.minutes_* series being five minutes, it doesn't make sense for it to query such short time ranges. It also assumes that one wants to have a "nice" table (suitable for a spreadsheet application or an RDBMS like PostgreSQL and InfluxDB sure won't complain), i.e. each row has all the value columns, which results in a lot of complexity for matching the timestamps to a raster.

The other option would be to have a single "value" field and tags to assign names to the points, which is perfectly fine with InfluxDB and makes application design a whole lot simpler, so this might be a better approach for your use case. You don't need the complicated logic to synchronize the timestamps of the points, as you don't care if a point is reported a few seconds ahead of the other (which would result in mostly empty table rows in a RDBMS), the GROUP BY in InfluxDB will take care of that for you.

I know that overwriting points is no problem in InfluxDB when there are no continuous queries and retention policies that downsample the data, but I dislike doing it as I don't consider it correct design, but sometimes one has to do it to keep the application complexity manageable. I haven't tested / looked into overwriting with downsamling running, as I haven't yet had the need to, but I'd be curious about the results.

To add to the complexity of not being able to specify a range and not knowing how many points it will return beforehand, the device has no way to tell you when there is a gap in its data, it will simply return no values, which it also does randomly if there is no gap at all. Gaps may occur for example when its battery runs out (below the island cutoff, default is 7% I think) during the night or during an update of its firmware. I haven't found an easy solution and the timeseries-tool is definitely prone to never stopping if the gap is too large. I've thought about moving the "begin" timestamp five minutes into the past each time no results are returned twice in a row, but it adds a lot of complexity.

The app has to implement all that logic, so I don't think they designed it that way for the app, but because it may have been the easiest way to do it with the limitation of the protocol (no way to send two values in one "transaction" by default, as would be needed for start-end-ranges) for such a use case. More important functions work by changing "COM service" for what I guess is telling the board that a set of parameters is expected to be sent, and then applying them in one go when the "COM service" is written to again. This seems to allow them to implement something akin of "transactions", which is really helpful when changing parameters such as network connection settings, where setting one alone may render the device inaccessible with no means for the customer to reconnect. But the process is very complicated and slow (you can see it happen when you change to the network settings in the app and then look at the event table in the history-tab) and thus unsuitable for the time series interface used by customers.

Grillbert commented 3 years ago

OK... to me that sounds like the "live data" is best requested as single queries every few minutes from the Inverter. For the Historical data your Python Script could then be used to run every night. So far i am using NodeRed for my home automation tasks so if i can use Python from within NodeRed is the next thing to be clarified for me ... unfortunately i am no expert in any of these technologies and have to go forward in a very iterative way ;) (I really wonder why RCT did not just add a well-documented Rest Interface to their device - at least to READ all kinds of data)

svalouch commented 3 years ago

Well, if you query the same data that is queried by timeseries2csv.py, you wouldn't need to run it every night as you're already collecting the data incrementally throughout the day. You'd only need it to do the initial import of the historical values. That way you wouldn't need to maintain two different ingestion mechanisms side-by-side.

I haven't used NodeRed yet, though it's on my list of things to check out. I guess it makes most sense to do all the data querying (and retrying and stitching together) in a Python script as it's rather complicated, and call it from NodeRed using an exec node. I guess you can read data from that node, so you'd simply print CSV or JSON data to standard-out and NodeRed could pick it up and work with it. Just make sure that the script terminates eventually.

Something else just popped into my mind: There's a set of counters (Device → Measured values → Energy) (i.e. fields that only count in one way and may be reset to 0 occasionally) such as energy.e_dc_total[0] for "Solar Generator A total Energy [MWh]" or energy.e_dc_day[0] for "Solar generator A day energy kWh]". By calculating the difference between two points you'd get the increase of the value in that time frame, and InfluxDB has builtin functions to do just that. The beauty of such counters (if they're implemented correctly on the device's side) is that you won't have to worry about missing values, as all that happens is that your query returns fewer points and result in a more blocky graph, and the reset to 0 at midnight should be handled by the InfluxDB automatically, i.e. it shouldn't shoot off into negative values. You'd just need to fire off a bunch of simple and fast READ commands (e.g. as easy as running rctclient read-value ... a couple of times) every minute or so (or every 5 to match the device's own stepping, though more data points result in smoother graphs) and insert them as they come in, and in the end the graph should look quite similar (if not identical) to the time series stored by the device.

I guess RCT chose an implementation that's rather minimal and can be implemented on a microcontroller. For REST, you'd need some kind of webserver, and the entire stack has a lot of overhead compared to simply doing raw TCP/IP. Actually, the devices do have a web interface, but when I asked about it they said it's only for their technicians. So in the end your guess is as good as mine :)

svalouch commented 1 year ago

Let me quickly mention that there are some projects that continuously query inverters and make the data available. You can check a few of them out at https://rctclient.readthedocs.io/en/v0.0.4/#target-audience, though the list is likely not conclusive. That being said: Closing as there are some solutions available.

svalouch / python-rctclient

New Version #4