influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.45k stars 5.55k forks source link

Multiple wildcards in filter patterns (namepass etc) give inconsistent output #9265

Open hackery opened 3 years ago

hackery commented 3 years ago

Relevant telegraf.conf:

[[inputs.exec]]
  data_format = "influx"

  commands = [
    "echo cpu.process32:testcase.exe:caserunner.2222 value=42",
    "echo process32:testcase.exe:caserunner.2222 value=42",
    "echo otherWantedMetric value=0",
    "echo unwantedMetric value=0",
  ]

  namepass = [
        '*process32:*.exe:*.*',
        'otherWantedMetric*',
  ]

System info:

Telegraf 1.13.3 (git: HEAD da364558)

Steps to reproduce:

  1. Add multiple wildcard patterns to any filter clause
  2. Feed matching and non-matching lines into input

Expected behavior:

Metrics are correctly filtered:

:!/usr/bin/telegraf -config etc.testcase.filter/telegraf.conf --test
2021-05-12T11:09:19Z I! Starting Telegraf 1.13.3
> cpu.process32:testcase.exe:caserunner.2222 value=42 1620817760000000000
> process32:testcase.exe:caserunner.2222 value=42 1620817760000000000
> otherWantedMetric value=0 1620817760000000000

Actual behavior:

Some metrics are dropped when they should be passed (or vice versa for "drop" rules):

2021-05-12T11:10:46Z I! Starting Telegraf 1.13.3
> otherWantedMetric value=0 1620817847000000000

Additional info:

Removing otherWantedMetric* from the filter list, permits the "process32" ones to pass:

2021-05-12T11:11:49Z I! Starting Telegraf 1.13.3
> cpu.process32:testcase.exe:caserunner.2222 value=42 1620817910000000000
> process32:testcase.exe:caserunner.2222 value=42 1620817910000000000

Depending on the specific wildcards used in a set of patterns, there is sometimes also an ordering dependency, where switching two patterns filters correctly.

Yes, I know the naming here is an antipattern and they should be tagged like process32,exe=testcase.exe,activity=caserunner,act_id=2222 cpu=42 ... but these metrics are from a legacy system, we're having to ingest them using the existing names "for historical reasons".

I believe this behaviour is due to bugs in the gobwas/glob library (several cases of unexpected pattern behaviour have been reported over its lifetime) and I've created issue gobwas/glob#50 there, but there may also be mitigations or changes to make in Telegraf:

hackery commented 3 years ago

I realise 1.13 is fairly old now, so I've just repeated the test using a 1.18.2 binary download, and with a local build from nearly-current master branch:

$ ./telegraf-1.18.2/usr/bin/telegraf --config etc.testcase.filter/telegraf.conf --test
2021-05-12T17:31:29Z I! Starting Telegraf 1.18.2
> otherWantedMetric value=0 1620840690000000000

$ ../src/influxdata/telegraf/telegraf --version
Telegraf unknown (git: master b56ffdc4)
$ ../src/influxdata/telegraf/telegraf --config etc.testcase.filter/telegraf.conf --test
2021-05-12T17:32:48Z I! Starting Telegraf 
> otherWantedMetric value=0 1620840769000000000