influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.63k stars 5.58k forks source link

prometheusremotewrite: Untyped values are dropped silently #15782

Closed hagen1778 closed 1 month ago

hagen1778 commented 2 months ago

Relevant telegraf.conf

[[inputs.file]]
  files = ['metrics.influx']
  data_format = "influx"
[[outputs.http]]
  url = "http://localhost:8428/api/v1/write"
  data_format = "prometheusremotewrite"
  [outputs.http.headers]
     Authorization = "Bearer redacted"
     Content-Type = "application/x-protobuf"
     Content-Encoding = "snappy"
     X-Prometheus-Remote-Write-Version = "0.1.0"
  [outputs.http.tagpass]
    data_type = ["application"]

Logs from Telegraf

No logs

System info

telegraf-1.31.3

Docker

No response

Steps to reproduce

  1. Create a file metrics.influx with following content:
    measurement,az=us-east-1a,component=ingester,serviceType=application,cluster=foo,environment=test,host=bar,uid=1234567890,instanceType=s-tier,job=server,region=us-east-1 throughput="0" 1724701415000000000
  2. Check that field value throughput="0" is quoted
  3. Run telegraf with the config mentioned above in the report
  4. Observe telegraf logs or logs of the remote database

Expected behavior

  1. Telegraf should notify user that field throughput="0" can't be parsed. This can be done via logs or metrics. If logs are too verbose, telegraf should at least check for errors once per batch.
  2. Telegraf should try parsing quoted value in case if it can be converted to numeric value.

Actual behavior

Nothing happens. Metrics are collected from the file and silently dropped here https://github.com/influxdata/telegraf/blob/e94f0c58c582493b929c777c1b285579ffa39b94/plugins/serializers/prometheusremotewrite/prometheusremotewrite.go#L52-L56 No logs are printed, no hints about what's happening. The remote database receives POST requests with empty body and can't provide a hint to a user what's wrong.

Additional info

No response

srebhan commented 2 months ago

@hagen1778 this is a limitation of prometheus and documented in the README of the serializer:

Note: String fields are ignored and do not produce Prometheus metrics.

We could log those once if that's sufficient for your use-case!?

hagen1778 commented 2 months ago

We could log those once if that's sufficient for your use-case!?

Yes, I think something like that would be sufficient.

The problem I faced was a user complaining that remote destination (VictoriaMetrics) was dropping data ingested from telegraf client. The further investigation revealed that remote destination was receiving empty POST requests from telegraf. I had to check telegraf code in order to understand what could have caused it sending empty requests, and this is how I discovered this behavior.

Logging each skipped line could be too verbose. Maybe logging a single message like n/n lines were dropped in batch per batch if at least one row were skipped when forming this batch? I can contribute to the fix if you agree to this approach.

srebhan commented 2 months ago

Please feel free to put up a PR. If you cannot do the message shown above, simply log the metric-name and field and then mark it in a map as "already logged" and skip logging next time.

I think a warning would probably be appropriate.