influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.51k stars 5.56k forks source link

cloud_pubsub no ack and output problems #14648

Closed ledniy closed 7 months ago

ledniy commented 7 months ago

Relevant telegraf.conf

[agent]
      collection_jitter = "0s"
      debug = true
      flush_interval = "10s"
      flush_jitter = "0s"
      hostname = "$HOSTNAME"
      interval = "10s"
      logfile = ""
      metric_batch_size = 1000
      metric_buffer_limit = 10000
      omit_hostname = false
      precision = ""
      quiet = false
      round_interval = true
    [[processors.parser]]
      data_format = "influx"
      drop_original = true
      parse_fields = [
        "jsonPayload_message"
      ]

    [[outputs.influxdb_v2]]
      bucket = "XXXXX"
      data_format = "influx"
      organization = "XXXXX"
      token = "$INFLUX_TOKEN"
      urls = [
        "https://us-central1-1.gcp.cloud2.influxdata.com"
      ]

    [[inputs.cloud_pubsub]]
      data_format = "json"
      json_string_fields = [
        "jsonPayload_message"
      ]
      project = "XXXXX"
      subscription = "XXXXX"

Logs from Telegraf

2024-01-30T08:50:22Z I! Loading config: /etc/telegraf/telegraf.conf
2024-01-30T08:50:22Z I! Starting Telegraf 1.29.2 brought to you by InfluxData the makers of InfluxDB
2024-01-30T08:50:22Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 6 secret-stores
2024-01-30T08:50:22Z I! Loaded inputs: cloud_pubsub
2024-01-30T08:50:22Z I! Loaded aggregators:
2024-01-30T08:50:22Z I! Loaded processors: parser
2024-01-30T08:50:22Z I! Loaded secretstores:
2024-01-30T08:50:22Z I! Loaded outputs: influxdb_v2
2024-01-30T08:50:22Z I! Tags enabled: host=telegraf-polling-service
2024-01-30T08:50:22Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"telegraf-polling-service", Flush Interval:10s
2024-01-30T08:50:22Z D! [agent] Initializing plugins
2024-01-30T08:50:22Z D! [agent] Connecting outputs
2024-01-30T08:50:22Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2024-01-30T08:50:22Z D! [agent] Successfully connected to outputs.influxdb_v2
2024-01-30T08:50:22Z D! [agent] Starting service inputs
2024-01-30T08:50:22Z I! [inputs.cloud_pubsub] Starting receiver for subscription cloud-metrics...
2024-01-30T08:50:32Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:50:42Z D! [outputs.influxdb_v2] Wrote batch of 4 metrics in 234.895573ms
2024-01-30T08:50:42Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:50:52Z D! [outputs.influxdb_v2] Wrote batch of 3 metrics in 223.106057ms
2024-01-30T08:50:52Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:51:02Z D! [outputs.influxdb_v2] Wrote batch of 993 metrics in 175.26153ms
2024-01-30T08:51:02Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:51:12Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:51:22Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:51:32Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:51:42Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:51:52Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:52:02Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:52:12Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T08:52:22Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics

System info

Telegraf 1.29.2, Helm chart 1.8.40

Docker

No response

Steps to reproduce

GCP Cloud Logs -> PubSub -> telegraf

Expected behavior

Messages should be acked, and output should be written even if there's low amount of events

Actual behavior

Messaged are not acked at all

image

Output is working good only at startup and then I don't see new events

Additional info

Subscription config: imagea

powersj commented 7 months ago

Hi,

The cloud_pubsub plugin will only read up to 1000 messages at a time. This is controlled by the max_undelivered_messages = 1000 setting. Given you successfully wrote out 1000 messages it looks like messages are not acked. We refer to these type of metrics as tracking metrics.

The issue here is that you are using the parser plugin. I do not see any handling of the tracking metrics in this plugin and when you parse out the old message, it is essentially tossed, new metrics returned, and the old message never acknowledged.

The fix here is to have the parser processor learn about tracking metrics and either ack metrics if they are not carried on, or transfer the tracking to the new metrics.

ledniy commented 7 months ago

I wasn't able to make it work without parser as entries in pubsub look like:

{
  // many more fields
  "jsonPayload": {
    "level":"metric",
    "message":"metricName,group=1 duration=100 1706097218768718208"
  }
}
powersj commented 7 months ago

@ledniy,

Can you give the artifacts in #14655 a try?

Thanks!

ledniy commented 7 months ago

@powersj seems like it fixes the issue, thank you!

2024-01-30T21:45:36Z I! Loading config: telegraf.conf
2024-01-30T21:45:36Z I! Starting Telegraf 1.30.0-493dcf73 brought to you by InfluxData the makers of InfluxDB
2024-01-30T21:45:36Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 61 outputs, 5 secret-stores
2024-01-30T21:45:36Z I! Loaded inputs: cloud_pubsub
2024-01-30T21:45:36Z I! Loaded aggregators:
2024-01-30T21:45:36Z I! Loaded processors: parser
2024-01-30T21:45:36Z I! Loaded secretstores:
2024-01-30T21:45:36Z I! Loaded outputs: influxdb_v2
2024-01-30T21:45:36Z I! Tags enabled: host=$HOSTNAME
2024-01-30T21:45:36Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"$HOSTNAME", Flush Interval:10s
2024-01-30T21:45:36Z D! [agent] Initializing plugins
2024-01-30T21:45:36Z D! [agent] Connecting outputs
2024-01-30T21:45:36Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2024-01-30T21:45:36Z D! [agent] Successfully connected to outputs.influxdb_v2
2024-01-30T21:45:36Z D! [agent] Starting service inputs
2024-01-30T21:45:36Z I! [inputs.cloud_pubsub] Starting receiver for subscription cloud-metrics...
2024-01-30T21:45:42Z D! [outputs.influxdb_v2] Wrote batch of 1000 metrics in 1.210126542s
2024-01-30T21:45:42Z D! [outputs.influxdb_v2] Buffer fullness: 1129 / 10000 metrics
2024-01-30T21:45:42Z D! [outputs.influxdb_v2] Wrote batch of 1000 metrics in 289.517042ms
2024-01-30T21:45:42Z D! [outputs.influxdb_v2] Buffer fullness: 129 / 10000 metrics
2024-01-30T21:45:46Z D! [outputs.influxdb_v2] Wrote batch of 129 metrics in 254.406583ms
2024-01-30T21:45:46Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:45:56Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:46:06Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:46:16Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:46:26Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:46:36Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:46:46Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:46:56Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:47:06Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:47:16Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:47:26Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:47:36Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:47:46Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:47:56Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:48:06Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:48:16Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:48:26Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:48:36Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:48:46Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:48:56Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:49:06Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:49:16Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:49:26Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:49:36Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:49:46Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:49:56Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:50:06Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:50:16Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:50:26Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:50:36Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:50:46Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
2024-01-30T21:50:56Z D! [outputs.influxdb_v2] Wrote batch of 6 metrics in 575.106667ms
2024-01-30T21:50:56Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics
image