influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.62k stars 5.58k forks source link

shim input plugins generate metric with wrong timestamp #10881

Open falon opened 2 years ago

falon commented 2 years ago

Feature Request

A shim input plugin should generate metrics with the same timestamp precision of the telegraf main process. In telegraf I see that precision will be set to the same timestamp order as the collection interval.

Current behavior:

/usr/bin/telegraf-ds389 -config /etc/CSI-telegraf-plugins/ds389-free.conf -poll_interval 1s ds389,current=20220324091529Z,port=12345,server=alice.example.com,start=20211019064205Z,version=389-Directory/1.4.3.22\ B2021.085.1455 readops=0i,connectionseq=993655i 1648113329859815068

Desired behavior:

/usr/bin/telegraf-ds389 -config /etc/CSI-telegraf-plugins/ds389-free.conf -poll_interval 1s ds389,current=20220324091529Z,port=12345,server=alice.example.com,start=20211019064205Z,version=389-Directory/1.4.3.22\ B2021.085.1455 readops=0i,connectionseq=993655i 1648113329000000000

You could add a "precision" configuration parameter in external plugins too, or adjust the precision to the same order of poll interval.

Use case:

I suspect, but I'm not currently sure, that if Telegraf generate some mixed output (from main process and from external plugins) with different precision (in particular with fraction of second) then some outputs such as Splunk could be confused.

powersj commented 2 years ago

Hi,

Can you provide an example config that I might try reproducing this with?

Thanks

falon commented 2 years ago

Hello @powersj

a simple config that reproduces the "issue" is

[agent]
    interval = "60s"
    debug = false
    hostname = "alice.example.com"
    round_interval = true
    flush_interval = "10s"
    flush_jitter = "0s"
    collection_jitter = "0s"
    metric_batch_size = 1000
    metric_buffer_limit = 10000
    quiet = true
    logfile = "/var/log/telegraf/telegraf.log"
    omit_hostname = false

[[inputs.disk]]

[[inputs.execd]]
    command = ["/usr/bin/telegraf-ds389", "-config", "/etc/CSI-telegraf-plugins/ds389-free.conf", "-poll_interval", "1m"]
#### ds389-free.conf ####

[[inputs.ds389]]
  host = "<fqdn of your LDAP server>"
  port = 389

  # dn/password to bind with. If bind_dn is empty, an anonymous bind is performed.
  bindDn = ""
  bindPassword = ""

  # If true, alldbmonitor monitors all db and overrides dbtomonitor.
  alldbmonitor = true

  # Connections status monitor
  status = false

The agent interval of 60s determines a timestamp precision of 1s in disk input plugin (and any other internal plugins). The same poll interval of 60s doesn't set the same timestamp precision in the execd plugin. I used the ds389 plugin in the example, but I think you can see the same behavior with another external input plugin.

Thank you!