Retry or discard write to influxdb when host is down

soplan commented 1 year ago

Grott (ran on Windows 10) is being stopped when Influxdb is unreachable. In this case influxdb is running on the same host but I stopped the service to test behavior of grott.

This will result in the following message

       - Grott write to influxdb v2
         - Grott InfluxDB error
<urllib3.connection.HTTPConnection object at 0x046A8A00>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it
Grott Influxdb write error, grott will be stopped

Ideally grott should retry sending, but if this involves caching we could discard this packet and try again with a new packet whenever the host gets back online.

This ensures grott is not stopped if the influxdb host is down for whatever reason and continues sending data to influxdb when host is back online either with cached data or latest.

I have not tested with mqtt etc. Might require same handling.

robupham commented 1 year ago

+1 on this request please.

My InfluxDB instance runs as part of Home Assistant on a separate VM. Sometimes I’ve noticed my inverter appears offline on the Growatt cloud platform (ShinePhone or web) and the root cause is the VM is stopped, blocking Grott from processing data and passing it to Growatt.

Yes, I know I should make my HomeAssistant / InfluxDB instance more resilient. 🙂

johanmeijer commented 1 year ago

Ok. I noticed that it might not be the best solution to stop Grott if an influxdb error occurs. That is something that can easily be changed (I think). But it makes us at least aware of problems with influxdb.

Caching the data is another thing. Grott is as it called stateless. So it "remembers" nothing. That makes things a lot easier otherwise what happens when Grott stops, should I save data to a file in a Database to make it persistent. This one of the reason I created the influxDB interface store the data in a database.

What I also can do is change something in the order of processing. Now Grott sends the data to Growatt and after that it is processing the record (and send to MQTT, Influxdb etc). If we turn this around then Grott will try to process the data and if this fails it can stop sending the data to Growatt. As long the inverter (datalogger) does not get a confirmation from the Growatt server, the inverter will keep the data record and will try to send this later again (as buffered records)

So in that case the inverter / datalogger is the cache. One minor detail is that not all dataloggers are very good in keeping up with the date/time. In that case Grott will ignore these buffered records and use the server time for date/time processing for the real time records.

soplan commented 1 year ago

I would suggest to keep it simple

first send data to growatt server
Send data to influxdb

if influxdb is not online, drop this and continue receiving data from inverter. If next packet is received from inverter do same step again. If influxdb is down, drop this.

Worst case is that there is a gap between what is stored at growatt and in influxdb. Who cares if there is 1 hour or day or week of missing data.

Caching just complicates things.

robupham commented 1 year ago

I like soplan’s suggestion… but what happens if Growatt server is down? (which happens more than we’d all like) It would be good for that to be a non-blocking situation, and for the data to be recorded in InfluxDB even if the Growatt platform is down.

soplan commented 1 year ago

I like soplan’s suggestion… but what happens if Growatt server is down? (which happens more than we’d all like) It would be good for that to be a non-blocking situation, and for the data to be recorded in InfluxDB even if the Growatt platform is down.

As long the inverter (datalogger) does not get a confirmation from the Growatt server, the inverter will keep the data record and will try to send this later again (as buffered records)

so in that case do not send to influxdb because when growatt comes back online it will send from cache and then grott needs to compare if it already sent to influxdb or influxdb gets duplicate records.

Keep it simple. Iterate on it but start simple.

johanmeijer commented 1 year ago

I will change Grott and prevent it from stopping if an influxdb errors occurs.

johanmeijer / grott

Retry or discard write to influxdb when host is down #295