open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.96k stars 2.3k forks source link

[receiver/postgresql] -- "Null values not handled in colum client_addr" #33107

Open adussarps opened 4 months ago

adussarps commented 4 months ago

Component(s)

receiver/postgresql

What happened?

Description

I've recently initiated monitoring on our SQL databases hosted on OVH using the PostgreSQL receiver. While the receiver successfully establishes a connection to the database, it encounters an error during the scraping process, resulting in failed metrics retrieval.

scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind": "receiver", "name": "postgresql", "data_type": "metrics", "error": "sql: Scan error on column index 0, name \"client_addr\": converting NULL to string is unsupported", "scraper": "postgresql"}

Note that there is indeed a null value in the pg_stat_activity; 'client_addr' column.

Expected Result

Ideally, the collector should be capable of gracefully handling rows containing NULL values, thereby bypassing them during the scraping process rather than halting due to such instances.

This issue impedes our ability to effectively monitor our databases and necessitates a resolution to ensure seamless metric collection.

Collector version

v0.88.0

PostgreSQL - version 15

Environment information

Environment

Installed via Helm version v0.99.0 on Kubernetes

OpenTelemetry Collector configuration

extensions:
    health_check: {}
    basicauth/grafana_cloud_tempo:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/basicauthextension
      client_auth:
        username: XX
        password: XX
    basicauth/grafana_cloud_prometheus:
      client_auth:
        username: XX
        password: XX
    basicauth/grafana_cloud_loki:
      client_auth:
        username: XX
        password: XX

  receivers:
    postgresql:
      endpoint: "postgresql-f32d4eca-o64f91e18.database.cloud.ovh.net:20184"
      username: avnadmin
      password: XXXXX
      databases: []
    hostmetrics:
      # Optional. Host Metrics Receiver added as an example of Infra Monitoring capabilities of the OpenTelemetry Collector
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver
      scrapers:
        load:
        memory:

  processors:
    resource:
      attributes:
      - action: insert
        key: service_name
        from_attribute: service.name
      - action: insert
        key: service_name
        from_attribute: k8s.deployment.name
      - action: insert
        key: service_name
        from_attribute: k8s.service.name
      - action: insert
        key: service_namespace
        from_attribute: service.namespace
      - action: insert
        key: service_namespace
        from_attribute: k8s.namespace.name
      - action: insert
        key: service_version
        from_attribute: service.version
      - action: insert
        key: deployment_environment
        from_attribute: deployment.environment
      - action: insert
        key: environment
        value: staging
      - action: insert
        key: project
        value: xano-weweb
      - action: insert
        key: loki.resource.labels
        value: service_name,service_namespace,service_version,deployment_environment,project,environment
    resourcedetection:
      # Enriches telemetry data with resource information from the host
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor
      detectors: ["env", "system"]
      override: false
    transform/add_resource_attributes_as_metric_attributes:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/transformprocessor
      error_mode: ignore
      metric_statements:
        - context: datapoint
          statements:
            - set(attributes["namespace"], resource.attributes["k8s.namespace.name"])
            - set(attributes["container"], resource.attributes["k8s.container.name"])
            - set(attributes["pod"], resource.attributes["k8s.pod.name"])

  exporters:
    debug:
      verbosity: normal
      sampling_initial: 5
      sampling_thereafter: 200
    otlp/grafana_cloud_traces:
      # https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlpexporter
      endpoint: "tempo-prod-10-prod-eu-west-2.grafana.net:443"
      auth:
        authenticator: basicauth/grafana_cloud_tempo

    loki/grafana_cloud_logs:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/lokiexporter
      endpoint: "https://logs-prod-012.grafana.net/loki/api/v1/push"
      auth:
        authenticator: basicauth/grafana_cloud_loki

    prometheusremotewrite/grafana_cloud_metrics:
      # https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/prometheusremotewriteexporter
      endpoint: XXXX
      resource_to_telemetry_conversion:
        enabled: true
      add_metric_suffixes: false
      auth:
        authenticator: basicauth/grafana_cloud_prometheus

  service:
    extensions: [health_check, basicauth/grafana_cloud_tempo, basicauth/grafana_cloud_prometheus, basicauth/grafana_cloud_loki]
    pipelines:
      traces:
        receivers: [otlp]
        processors: [resourcedetection]
        exporters: [otlp/grafana_cloud_traces]
      metrics:
        receivers: [otlp, hostmetrics, postgresql]
        processors: [resourcedetection, transform/add_resource_attributes_as_metric_attributes]
        exporters: [prometheusremotewrite/grafana_cloud_metrics]
      logs:
        receivers: [otlp]
        processors: [resourcedetection, resource]
        exporters: [loki/grafana_cloud_logs]

Log output

2024-05-17T09:45:45.624Z    info    service@v0.99.0/service.go:99   Setting up own telemetry...
2024-05-17T09:45:45.625Z    info    service@v0.99.0/telemetry.go:103    Serving metrics {"address": "10.2.0.80:8888", "level": "Normal"}
2024-05-17T09:45:45.628Z    info    lokiexporter@v0.99.0/exporter.go:44 using the new Loki exporter {"kind": "exporter", "data_type": "logs", "name": "loki/grafana_cloud_logs"}
2024-05-17T09:45:45.628Z    warn    postgresqlreceiver@v0.99.0/scraper.go:85    Feature gate receiver.postgresql.separateSchemaAttr is not enabled. Please see the README for more information: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.88.0/receiver/postgresqlreceiver/README.md    {"kind": "receiver", "name": "postgresql", "data_type": "metrics"}
2024-05-17T09:45:45.630Z    info    service@v0.99.0/service.go:166  Starting otelcol-contrib... {"Version": "0.99.0", "NumCPU": 2}
2024-05-17T09:45:45.630Z    info    extensions/extensions.go:34 Starting extensions...
2024-05-17T09:45:45.630Z    info    extensions/extensions.go:37 Extension is starting...    {"kind": "extension", "name": "basicauth/grafana_cloud_loki"}
2024-05-17T09:45:45.630Z    info    extensions/extensions.go:52 Extension started.  {"kind": "extension", "name": "basicauth/grafana_cloud_loki"}
2024-05-17T09:45:45.630Z    info    extensions/extensions.go:37 Extension is starting...    {"kind": "extension", "name": "basicauth/grafana_cloud_prometheus"}
2024-05-17T09:45:45.630Z    info    extensions/extensions.go:52 Extension started.  {"kind": "extension", "name": "basicauth/grafana_cloud_prometheus"}
2024-05-17T09:45:45.630Z    info    extensions/extensions.go:37 Extension is starting...    {"kind": "extension", "name": "basicauth/grafana_cloud_tempo"}
2024-05-17T09:45:45.630Z    info    extensions/extensions.go:52 Extension started.  {"kind": "extension", "name": "basicauth/grafana_cloud_tempo"}
2024-05-17T09:45:45.630Z    info    extensions/extensions.go:37 Extension is starting...    {"kind": "extension", "name": "health_check"}
2024-05-17T09:45:45.630Z    info    healthcheckextension@v0.99.0/healthcheckextension.go:35 Starting health_check extension {"kind": "extension", "name": "health_check", "config": {"Endpoint":"10.2.0.80:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-05-17T09:45:45.634Z    info    extensions/extensions.go:52 Extension started.  {"kind": "extension", "name": "health_check"}
2024-05-17T09:45:45.635Z    info    internal/resourcedetection.go:125   began detecting resource information    {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics"}
2024-05-17T09:45:45.635Z    info    system/system.go:209    This attribute changed from int to string. Temporarily switch back to int using the feature gate.   {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics", "attribute": "host.cpu.family", "feature gate": "processor.resourcedetection.hostCPUModelAndFamilyAsString"}
2024-05-17T09:45:45.635Z    info    system/system.go:228    This attribute changed from int to string. Temporarily switch back to int using the feature gate.   {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics", "attribute": "host.cpu.model.id", "feature gate": "processor.resourcedetection.hostCPUModelAndFamilyAsString"}
2024-05-17T09:45:45.635Z    info    system/system.go:247    This attribute will change from int to string. Switch now using the feature gate.   {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics", "attribute": "host.cpu.stepping", "feature gate": "processor.resourcedetection.hostCPUSteppingAsString"}
2024-05-17T09:45:45.635Z    info    internal/resourcedetection.go:139   detected resource information   {"kind": "processor", "name": "resourcedetection", "pipeline": "metrics", "resource": {"host.name":"my-opentelemetry-collector-9c7d446cb-g8mr7","os.type":"linux"}}
2024-05-17T09:45:45.638Z    info    otlpreceiver@v0.99.0/otlp.go:102    Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "10.2.0.80:4317"}
2024-05-17T09:45:45.638Z    info    otlpreceiver@v0.99.0/otlp.go:152    Starting HTTP server    {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "10.2.0.80:4318"}
2024-05-17T09:45:45.719Z    info    k8sclusterreceiver@v0.99.0/receiver.go:63   Starting shared informers and wait for initial cache sync.  {"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
2024-05-17T09:45:45.720Z    info    k8sobjectsreceiver@v0.99.0/receiver.go:74   Object Receiver started {"kind": "receiver", "name": "k8sobjects", "data_type": "logs"}
2024-05-17T09:45:45.720Z    info    k8sobjectsreceiver@v0.99.0/receiver.go:101  Started collecting  {"kind": "receiver", "name": "k8sobjects", "data_type": "logs", "gvr": "events.k8s.io/v1, Resource=events", "mode": "watch", "namespaces": []}
2024-05-17T09:45:45.720Z    info    healthcheck/handler.go:132  Health Check state change   {"kind": "extension", "name": "health_check", "status": "ready"}
2024-05-17T09:45:45.720Z    info    service@v0.99.0/service.go:192  Everything is ready. Begin running and processing data.
2024-05-17T09:45:45.720Z    warn    localhostgate/featuregate.go:63 The default endpoints for all servers in components will change to use localhost instead of 0.0.0.0 in a future version. Use the feature gate to preview the new default.   {"feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-05-17T09:45:45.820Z    info    k8sclusterreceiver@v0.99.0/receiver.go:84   Completed syncing shared informer caches.   {"kind": "receiver", "name": "k8s_cluster", "data_type": "metrics"}
2024-05-17T09:45:46.800Z    error   scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind": "receiver", "name": "postgresql", "data_type": "metrics", "error": "sql: Scan error on column index 0, name \"client_addr\": converting NULL to string is unsupported", "scraper": "postgresql"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
    go.opentelemetry.io/collector/receiver@v0.99.0/scraperhelper/scrapercontroller.go:197
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
    go.opentelemetry.io/collector/receiver@v0.99.0/scraperhelper/scrapercontroller.go:169
2024-05-17T09:45:56.767Z    error   scraperhelper/scrapercontroller.go:197  Error scraping metrics  {"kind": "receiver", "name": "postgresql", "data_type": "metrics", "error": "sql: Scan error on column index 0, name \"client_addr\": converting NULL to string is unsupported", "scraper": "postgresql"}
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
    go.opentelemetry.io/collector/receiver@v0.99.0/scraperhelper/scrapercontroller.go:197
go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
    go.opentelemetry.io/collector/receiver@v0.99.0/scraperhelper/scrapercontroller.go:173

Additional context

No response

github-actions[bot] commented 4 months ago

Pinging code owners:

djaglowski commented 4 months ago

Since client_addr is a common resource attribute, I think we should use a default value (e.g. "unknown") when it is null or empty.

github-actions[bot] commented 2 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

bamaas commented 3 weeks ago

Any solution/workaround for this?

djaglowski commented 2 weeks ago

Reviewing the thread, I think we just need a PR which sets a default value when client_addr is null.