inputs.http: Allow to configure fallback values in case of connection issues

liwde commented 2 months ago

Use Case

I have a balcony power plant at home which allows to produce solar power. The inverter of such a power plant makes sure you can feed the output from the solar panels into the power grid through a normal plug, and oftentimes it offers some observability through a small http server, where you can query the watt power that is produced at the moment. I want to use the inputs.http plugin to query that data to write it to InfluxDB.

However, at night, when the solar panels don't produce anything, the inverter is automatically turned off and stops being accessible via http, thus resulting in connection problems. I know that when that happens, the power output is actually 0 -- however, since the last reported value was greater then 0, this makes charts in InfluxDB look incorrect.

Thus, I'd like to be able to configure the inputs.http plugin to not error on connection issues, but rather respond with a configurable fallback value that can then be processed by subsequent plugins.

Expected behavior

I'd like to configure fallback metrics to emit in case of connection problems.

Open question outside of my concrete use case: Should that fallback also apply when the response code is not in success_status_codes, or when parsing the response has failed? Should this potentially be handled as different fallback values?

Alternatively, I'd like the error message to be available as metric, so that I can use further processors to react on this and create fallback values for me, based on certain error messages, if this would be more idiomatic for telegraf.

Actual behavior

https://github.com/influxdata/telegraf/blob/88bca70dfd3fce0e8320bf41abc92f0f32ef53ed/plugins/inputs/http/http.go#L176-L179

Additional info

This was also discussed at https://community.influxdata.com/t/telegraf-default-value-instead-of-connection-time-outs/21131, where different alternatives were proposed:

use inputs.execd to run custom console apps, which I want to avoid because I want to run in an environment without a shell
use a custom proxy that handles connection errors to the actual target system gracefully, which is a lot of overhead
use inputs.tail to parse telegraf's own error log, which feels brittle

srebhan commented 2 months ago

@liwde this is a special use-case so I suggest to solve it with the starlark aggregaror instead of modifying the input like

[agent]
  interval = "10s"
  flush_interval = "1s"
  omit_hostname = true

[[inputs.mock]]
  metric_name = "mock"

  [inputs.mock.tags]
    "name" = "mocker"

  [[inputs.mock.constant]]
    name = "value"
    value = 42.0

[[aggregators.starlark]]
  period = "2s"
  source = '''
load('time.star', 'time')

state = {}

def add(metric):
  state["last"] = metric

def push():
  metric = state.get("last")
  if metric == None or time.from_timestamp(0, metric.time) - time.now() > 2*time.second:
    return Metric("mock", {"name":"mocker"}, {"value": 0.0})

  return None

def reset():
  state.clear()
'''

[[outputs.file]]
  files = ["stdout"]
  data_format = "influx"

Of course you need to adapt the default metric and the "deadtime" but basically that's how would I do it... The alternative is to fill the missing values in the query...

liwde commented 2 months ago

@srebhan Thank you for the response, that is a really great idea to solve this that I had not found!

And reading up on aggregators, it makes even more sense, because telegraf keeps the original metrics (unless configured otherwise), so the aggregator really just needs to check the time of the last one (and doesn't have to keep track of the original metrics to re-emit them or something).

influxdata / telegraf