Closed agneborn98 closed 11 months ago
The code that causes this behavior is in plugins/common/shim/output.go. https://github.com/influxdata/telegraf/blob/ed8bd1dd525f266f204c921986de40bdf92c4e94/plugins/common/shim/output.go#L47
In this loop, the shim reads a line, parses it into a metric, then passes the single metric to the output's Write() method.
I don't think when the execd plugin streams metrics out that it marks the beginning or end of the batch. That means the shim on the other side doesn't know how many to gather together in a batch. Instead it calls write once per metric.
If this doesn't work for you could you share any alternate ideas and maybe contribute a PR to implement them?
Thank you for the response!
I don't really have the time or the expertise in Go to come up with a solution for this at the moment.
I think we can fairly say that this is not a bug, but rather a design decision/flaw. So if you agree, we can label this as a feature request instead?
I think you're right that execd outputs should get normal sized batches, not batches of a single metric. Let's leave this issue labeled as a bug.
We will need to work on the design for this fix before writing code. The execd output plugin will need to signal the end of a batch, and the shim will need to watch for that signal. The problem is that the stream between them is influxdb line protocol which doesn't have a native way to signal out of band information like this. We can't change to another protocol without breaking existing external outputs built with the current plugin and shim code. We will likely need to hide the signal in a line protocol comment or do something else that works with current and new code.
I'm not sure when this will be implemented. It's not currently scheduled for the project dev team. I will label it "help wanted" so people will know it's available for community members to work on.
use_batch_format
, which will serialize a batch and send that entire batch to stdin of the process.Closing as fixed.
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.25.0-fc8a300f, Windows 11 Pro 10.0.22000
Docker
No response
Steps to reproduce
metrics
in theWrite()
function.Expected behavior
I expected the length of the slice to be about the same as the batch size, since that is what the configuration says is the purpose of
metric_batch_size
:"This controls the size of writes that Telegraf sends to output plugins."
Actual behavior
As you can see in the log file, I printed out the length of the
metrics []telegraf.Metric
withlen(metrics)
, but it always stays at a length of 1 no matter which batch size I choose. It still prints out "Wrote batch of 100", but it doesn't actually buffer a hundred metrics. It just loops theWrite()
function a hundred times.Additional info
There is something that changes with batch size, because the amount of metrics that are written to the output every second (Throughput) changes when I change the value. I have no idea what actually changes, but it is not the size of that slice. I tried making the plugin internal, and then the slice is adjusted to be at or near the batch size, so I believe it has something to do with the external part.