influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.9k stars 3.55k forks source link

InfluxDB2 Task - First Regex Match Being Missed (Tasks) #20792

Open jelliuk opened 3 years ago

jelliuk commented 3 years ago

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

  1. Apply/Utilise Default Configuration for Telegraf Windows
  2. Import Task to Downsample Data (Downsample All) as per https://github.com/influxdata/community-templates/tree/master/downsampling
  3. Execute Task i.e. "downsample-all-10m" against specified Source (From Bucket) and Destination (To Bucket)

Expected behavior: All fields where there is a partial match using the supplied regex should be caught and downsampled.

Actual behavior: The first field in the regex partial match is not executed by the task This can be demonstrated by either:

  1. Rearranging the partial regex to demonstrate that the match occurs successfully as long as it is not first. However, whatever is now first will be missed further proving the issue.
  2. Entering a random unique string to effectively be caught first and allow all others to be matched works but is non-desirable

i.e.: all_data = from(bucket: fromBucket) |> range(start: -task.every) |> filter(fn: (r) => (r.measurement =~ /^cpu|disk|diskio|mem|processes|swap|system|internal.+|net|netstat|procstat|procstatlookup|smart.+$/))

Using the supplied downsample above, any measurement which contains "cpu" is missed. However, using either of the above options of rearranging the regex so that "cpu" isn't first or entering a dummy value at the start will enable "cpu" to match successfully.

It appears for whatever reason, InfluxDB2 is missing the first statement in the regex where measurement is assigned and then executed. As long as it is not the first measurement, it is downsampled correctly.

Environment info:

Not sure why the error related to boltdb occurs.

Config:

Telegraf Config - https://github.com/influxdata/telegraf/blob/master/etc/telegraf.conf Downsample Config (I've utilised downsample-all-10m) - https://github.com/influxdata/community-templates/blob/master/downsampling/all_inputs/downsampling_tasks.yml

Log: Log of the job and execution success: InfluxDB2-Downsample-Log.log

danxmoran commented 3 years ago

@jelliuk does it work if you remove the ^ from the start of your pattern? Do you also see the last option smart_.+$ being missed, or does that one work?

I'm wondering if the =~ might auto-insert the start and end anchors before testing the match, and the duplication is causing problems.