telegraf prints error log repeatly

choryuidentify commented 2 years ago

Hi.

I'm using this chart with GKE Autopilot cluster. I Install this chart, and bind pgpool serviceaccount to GCP IAM Role roles/cloudsql.viewer and roles/monitoring.metricWriter.

It looks working, but below error logs occured.

telegraf 2022-10-11T07:32:44Z E! [outputs.stackdriver] Get kind for metric "go_gc_duration_seconds" (telegraf.ValueType) field &{"sum" %!q(float64=0.01410381)} failed: unsupported telegraf value type: telegraf.ValueType
telegraf 2022-10-11T07:32:44Z E! [outputs.stackdriver] Get kind for metric "go_gc_duration_seconds" (telegraf.ValueType) field &{"0" %!q(float64=3.9631e-05)} failed: unsupported telegraf value type: telegraf.ValueType
telegraf 2022-10-11T07:32:44Z E! [outputs.stackdriver] Get kind for metric "go_gc_duration_seconds" (telegraf.ValueType) field &{"0.25" %!q(float64=7.2057e-05)} failed: unsupported telegraf value type: telegraf.ValueType
telegraf 2022-10-11T07:32:44Z E! [outputs.stackdriver] Get kind for metric "go_gc_duration_seconds" (telegraf.ValueType) field &{"0.5" %!q(float64=9.4315e-05)} failed: unsupported telegraf value type: telegraf.ValueType
telegraf 2022-10-11T07:32:44Z E! [outputs.stackdriver] Get kind for metric "go_gc_duration_seconds" (telegraf.ValueType) field &{"0.75" %!q(float64=0.000123461)} failed: unsupported telegraf value type: telegraf.ValueType
telegraf 2022-10-11T07:32:44Z E! [outputs.stackdriver] Get kind for metric "go_gc_duration_seconds" (telegraf.ValueType) field &{"1" %!q(float64=0.00195052)} failed: unsupported telegraf value type: telegraf.ValueType
telegraf 2022-10-11T07:32:44Z E! [outputs.stackdriver] Get kind for metric "go_gc_duration_seconds" (telegraf.ValueType) field &{"count" %!q(float64=119)} failed: unsupported telegraf value type: telegraf.ValueType

This logs printed every 10 second...

How can I fix it?

n-oden commented 2 years ago

@choryuidentify I'm actually seeing the same behavior. It may be a few days before I'm able to investigate in detail but I'll try to figure it out.

n-oden commented 2 years ago

@choryuidentify https://github.com/odenio/pgpool-cloudsql/pull/7 should address this -- the v1.0.10 release will include the fix.

In the meantime, if you want the spew to stop right now, you can edit configmap/pgpool-metadata-telegraf in whatever namespace you've deployed the chart too, and add the following at the end:

    [[inputs.internal]]
      collect_memstats = false

...and then restart all the pods.

n-oden commented 2 years ago

v1.0.10 has been released and should no longer display this behavior

n-oden commented 2 years ago

Hm, I may have spoken too soon -- even with collect_memstats set to false those errors still happen; this appears to related to https://github.com/influxdata/telegraf/issues/8514 which dates back from 2020. :(

I've released version 1.0.11 of pgpool-cloudsql, which fixes the issue via the simple expedient of filtering go_gc_duration_seconds out of telegraf's stdout/stderr streams. This will have to do in the short term: the real fix here is to update the telegraf stackdriver output plugin to fully support histogram/distribution metrics, which is probably going to require a substantial rewrite and is for sure not happening in 2022.

choryuidentify commented 2 years ago

@n-oden Thanks, It works!

odenio / pgpool-cloudsql

telegraf prints error log repeatly #6