open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

Filter processor fails to omit http_client_duration metric #7591

Closed gautam-nutalapati closed 2 years ago

gautam-nutalapati commented 2 years ago

Describe the bug I would like to omit http metrics generated by auto-instrumentation java agent. I am using filter to omit these metric names being pushed to prometheus. HTTP metrics generated by java agent are not filtered as expected.

Steps to reproduce

What did you expect to see?

What did you see instead? I see below metrics (Removed buckets to reduce clutter)

# HELP http_client_duration The duration of the outbound HTTP request
# TYPE http_client_duration histogram
http_client_duration_bucket{http_flavor="1.1",http_method="GET",http_url="http://localhost:9900/stux/v1/users/97378103842048256",le="5"} 5
http_client_duration_sum{http_flavor="1.1",http_method="GET",http_url="http://localhost:9900/stux/v1/users/97378103842048256"} 7.492469
http_client_duration_count{http_flavor="1.1",http_method="GET",http_url="http://localhost:9900/stux/v1/users/97378103842048256"} 5
http_client_duration_bucket{http_flavor="1.1",http_method="GET",http_status_code="200",http_url="https://idp.vault.dev-local.prisidio.net/.well-known/jwks.json",le="5"} 0
http_client_duration_sum{http_flavor="1.1",http_method="GET",http_status_code="200",http_url="https://idp.vault.dev-local.prisidio.net/.well-known/jwks.json"} 367.222015
http_client_duration_count{http_flavor="1.1",http_method="GET",http_status_code="200",http_url="https://idp.vault.dev-local.prisidio.net/.well-known/jwks.json"} 1
# HELP http_server_active_requests The number of concurrent HTTP requests that are currently in-flight
# TYPE http_server_active_requests gauge
http_server_active_requests{http_flavor="1.1",http_host="localhost:8060",http_method="GET",http_scheme="http"} 0
# HELP http_server_duration The duration of the inbound HTTP request
# TYPE http_server_duration histogram
http_server_duration_bucket{http_flavor="1.1",http_host="localhost:8060",http_method="GET",http_scheme="http",http_status_code="500",le="5"} 0
http_server_duration_sum{http_flavor="1.1",http_host="localhost:8060",http_method="GET",http_scheme="http",http_status_code="500"} 2805.492884
http_server_duration_count{http_flavor="1.1",http_host="localhost:8060",http_method="GET",http_scheme="http",http_status_code="500"} 1

Without this filter I see below metrics (note that api_latency is custom metric I added which was successfully filtered in above output).

# HELP api_latency API latency time in milliseconds.
# TYPE api_latency histogram
api_latency_bucket{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc",le="5"} 0
api_latency_sum{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc"} 2665
api_latency_count{api_method="GET",api_name="/users/v1/profiles/me",env="dev-local",status_code="500",svc="user-profile-svc"} 1
# HELP http_client_duration The duration of the outbound HTTP request
# TYPE http_client_duration histogram
http_client_duration_bucket{http_flavor="1.1",http_method="GET",http_url="http://localhost:9900/stux/v1/users/97378103842048256",le="5"} 5
http_client_duration_sum{http_flavor="1.1",http_method="GET",http_url="http://localhost:9900/stux/v1/users/97378103842048256"} 7.492469
http_client_duration_count{http_flavor="1.1",http_method="GET",http_url="http://localhost:9900/stux/v1/users/97378103842048256"} 5
http_client_duration_bucket{http_flavor="1.1",http_method="GET",http_status_code="200",http_url="https://idp.vault.dev-local.prisidio.net/.well-known/jwks.json",le="5"} 0
http_client_duration_sum{http_flavor="1.1",http_method="GET",http_status_code="200",http_url="https://idp.vault.dev-local.prisidio.net/.well-known/jwks.json"} 367.222015
http_client_duration_count{http_flavor="1.1",http_method="GET",http_status_code="200",http_url="https://idp.vault.dev-local.prisidio.net/.well-known/jwks.json"} 1
# HELP http_server_active_requests The number of concurrent HTTP requests that are currently in-flight
# TYPE http_server_active_requests gauge
http_server_active_requests{http_flavor="1.1",http_host="localhost:8060",http_method="GET",http_scheme="http"} 0
# HELP http_server_duration The duration of the inbound HTTP request
# TYPE http_server_duration histogram
http_server_duration_bucket{http_flavor="1.1",http_host="localhost:8060",http_method="GET",http_scheme="http",http_status_code="500",le="5"} 0
http_server_duration_sum{http_flavor="1.1",http_host="localhost:8060",http_method="GET",http_scheme="http",http_status_code="500"} 2805.492884
http_server_duration_count{http_flavor="1.1",http_host="localhost:8060",http_method="GET",http_scheme="http",http_status_code="500"} 1

What version did you use? Version: collector 0.16.0 0.43.0 java agent 1.10.1

What config did you use? Config:

extensions:
  health_check:
  pprof:
    endpoint: :1777
  zpages:
    endpoint: :55679

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  memory_limiter:
    check_interval: 1s
    limit_percentage: 50
    spike_limit_percentage: 30
  filter/metrics:
    metrics:
      exclude:
        match_type: regexp
        metric_names:
          - api_.*
          - http_.*
  batch/metrics:
    timeout: 30s
    send_batch_size: 200

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    metrics/2:
      receivers: [otlp]
      processors: [memory_limiter, filter/metrics, batch/metrics]
      exporters: [prometheus]

Environment OS: MacOS Monterey 12.0.1 Compiler N/A

Additional context If filter processor intentionally avoids filtering auto-instrumented http metrics, then this issue is irrelevant. Related to https://github.com/open-telemetry/opentelemetry-collector/issues/2310

Filter startup logs:

2022-02-07T15:46:42.075Z    info    filterprocessor@v0.43.0/filter_processor.go:78  Metric filter configured    {"kind": "processor", "name": "filter/metrics", "include match_type": "", "include expressions": [], "include metric names": [], "include metrics with resource attributes": null, "exclude match_type": "strict", "exclude expressions": [], "exclude metric names": ["api_latency", "http_client_duration"], "exclude metrics with resource attributes": null, "checksMetrics": true, "checkResouces": false}
jpkrohling commented 2 years ago

collector 0.16.0

2022-02-07T15:46:42.075Z    info    filterprocessor@v0.43.0/

Something doesn't add up :-)

In any case, I'm pinging the code owner: cc @boostchicken

gautam-nutalapati commented 2 years ago

Thanks for looking, Nice catch.. 0.16.0 was from https://github.com/aws-observability/aws-otel-collector I reproduced it with collector version is 0.43.0. Update: I created a opentelemetry-demo-app to reproduce the issue. Run docker-compose up and hit http://localhost:8899/metrics which shows http_client_duration metric, which should be filtered based on config in configs/otel/otel-config.yaml

boostchicken commented 2 years ago

@gautam-nutalapati can you try this on the latest versions and confirm if this is still an issue.

gautam-nutalapati commented 2 years ago

I tried the latest version, and it has been fixed 🎉 Thank you for the fix!!

taisph commented 1 year ago

I just tried adding a similar filter yet the metrics are not filtered. Could there be a regression?

{
    "caller": "filterprocessor@v0.75.0/metrics.go:101",
    "exclude expressions": [],
    "exclude match_type": "regexp",
    "exclude metric names": [
        "http_.*"
    ],
    "exclude metrics with resource attributes": null,
    "include expressions": [],
    "include match_type": "",
    "include metric names": [],
    "include metrics with resource attributes": null,
    "kind": "processor",
    "level": "info",
    "msg": "Metric filter configured",
    "name": "filter/metrics",
    "pipeline": "metrics",
    "ts": 1681808701.6006052
}
    processors:
      filter/metrics:
        metrics:
          exclude:
            match_type: regexp
            metric_names:
            - http_.*

    service:
      extensions: [health_check, memory_ballast]
      pipelines:
        metrics:
          receivers: [otlp, opencensus]
          processors: [memory_limiter, filter/metrics, batch/metrics]
          exporters: [prometheus, logging]
curl -s 'http://localhost:8889/metrics' | grep -E '^http_.*' | wc -l
216