Configuration setting for creating multiple instances of plugins for parallel processing

LukeWinikates commented 2 years ago

Use Case

What if telegraf processor plugins had a parallel instance count configuration, allowing multiple, instances of a processor to be initialized from the same configuration? Telegraf would partition the metrics evenly between each instance and metrics would flow to the next stage of the pipeline normally.

Background: My team and I use telegraf to scrape prometheus-style metrics from an endpoint that exposes a rather large number of metrics. We need to do some transformation of these metrics, and so we use the execd plugin to shell out to a golang binary to process these metrics.

We found that our execd plugin was a bottleneck, but it only does very modest string manipulation. We think that the bottleneck comes mostly from deserializing and reserializing the metrics from STDIN to STDOUT.

We managed to get more throughput by creating multiple instances of our plugin in our telegraf.conf file. We create instances 1...N with names like plugin-1 and plugin-2. We use the starlark processor to add a pipeline tag to each metric in round-robin fashion, like this:

state = {"round_robin_count": 0}
def apply(metric):
  num = (1 + state["round_robin_count"])
  state["round_robin_count"] = num % pipeline_count
  metric.tags["pipeline"] = "pipeline-{}".format(num)
  return metric

So the pipeline is something like Prometheus Input -> Starlark -> Execd Plugin {1...N} -> Output {1...N}

The pattern works well for us, but is noisy - the telegraf config has to be generated from a template and the telegraf.conf file has N mostly identical variants of the processor plugin and output plugin configurations.

Maybe telegraf itself would be able to do this with less complexity, since it could easily initialize multiple versions of a plugin from the same configuration and could probably route metrics between multiple instances without needing to create an ephemeral tag just for routing. For users who deal with a large metrics volume and encounter bottlenecks in their plugin pipeline, this could help them get greater throughput from their telegraf deployment without adding substantial complexity to their configuration.

Expected behavior

We wanted to be able to parallelize some of our processor plugins and output plugins easily

Actual behavior

We needed to do a fair bit of work to come up with a way to round-robin between multiple instances of the same plugin, and the resulting configuration file is large and contains a lot of duplication.

Additional info

Thanks for reading!

MyaLongmire commented 2 years ago

next steps: Look into an option that is not stdin and stdout to help with the potential bottleneck. Also, look into adding a parallelization option in the config is possible.

Would you be interested in making a pr for this?

knollet commented 1 year ago

So I was debugging the ominous [inputs.sysstat] Collection took longer than expected; not complete after interval of 10s message. Which always coincides with high load, the telegraf running at 100% cpu on one cpu...

So now I managed to reproduce it with a minimal config: https://gist.github.com/knollet/5d77f0816df8f10bcbb09c1301a97cf3

With this config (no output needed, the starlark does busywaiting, but could probably use a "sleep()" function if there were one), I can reliably get the "...not complete after..." after around 30 seconds.

Then I created a config to measure how long a metric takes to get through the processor-pipeline https://gist.github.com/knollet/2ac7364c1b1f924e139e06440b89bc61 this one

creates a mock-metric every second with a random "wait_time" between 1 and 5 seconds (obviously too many)
takes the time when the processor-pipeline is entered
a starlark processor waits for the "wait_time" (simulating slow processing)
takes the time again after the pipeline

the time the metrics spend in the pipeline goes continuously up. at some point the pipeline can't accumulate any more metrics and the "...took too long..." error will be triggered.

Some option to parallelize processors would probably help with this. ATM I'm trying the above mentioned round-robin-method.

influxdata / telegraf