open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.33k forks source link

Filter processor for span does not work ? #11536

Closed gidesh closed 2 years ago

gidesh commented 2 years ago

Describe the bug We are using a relatively new feature of filter processor to drop spans released on 23rd June https://github.com/open-telemetry/opentelemetry-collector-contrib/commit/4be2219303197649342e3434057db642c8653b01

We built the otel-collector binary using https://github.com/open-telemetry/opentelemetry-collector/tree/main/cmd/builder and - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor v0.54.1-0.20220623193207-4be221930319 in the manifest and deployed our docker image into GKE. The otel collector worked with this new binary however we were not seeing spans getting dropped for a certain microservice.

Steps to reproduce We have a Kubernetes microservice called hoff-api and a namespace called platform-apps and we are trying to drop the spans created for it by

  filter:
    spans:
      exclude:
        match_type: strict
        services:
          - hoff-api.platform-apps

What did you expect to see? No spans sent to Elastic APM/Grafana Tempo by otel-collector from hoff-api.platform-apps What did you see instead? Still seeing the hoff-api.platform-apps span in Elastic APM/Grafana Tempo by otel-collector

What version did you use? otel-collector-0.54 - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/filterprocessor v0.54.1-0.20220623193207-4be221930319

What config did you use?

receivers:

  # ideally we only support OTLP from Day 1, as all that teams should need
  otlp:                                                           
    protocols:
      grpc:
      http:

  # this is enabled for support of istio-proxy trace exports. Hopefully in future moves to OTLP
  opencensus:                                                     

  # this is enabled for nginx-ingress-controller exports - but currently not enabled in the nginx config
  jaeger:
    protocols:
      grpc:
      thrift_binary:
      thrift_compact:
      thrift_http:

processors:

  filter:
    spans:
      exclude:
        match_type: strict
        services:
          - hoff-api.platform-apps

  batch:

  memory_limiter:
    limit_mib: 1500                                               # 80% of maximum memory up to 2G
    spike_limit_mib: 512                                          # 25% of limit up to 2G
    check_interval: 5s

  k8sattributes:                                                  # contrib processor that injects k8s metadata 
    auth_type: "serviceAccount"
    passthrough: false
    filter:
      node_from_env_var: KUBE_NODE_NAME
    extract:
      metadata:
        - k8s.pod.name
        - k8s.pod.uid
        - k8s.deployment.name
        - k8s.cluster.name
        - k8s.namespace.name
        - k8s.node.name
        - k8s.pod.start_time
      labels:                                                     # extracts k8s label `key`, setting it as a tag on the span with `tag_name`
        - tag_name: app
          key: app
          from: pod
        - tag_name: environment
          key: environment
          from: pod
    pod_association:
    - from: resource_attribute
      name: k8s.pod.ip
    - from: resource_attribute
      name: k8s.pod.uid
    - from: connection

extensions:

  zpages: {}

  memory_ballast:
    size_mib: 683                                               # docs say should be max a third to half of memory

  health_check:
    path: "/healthz"
    check_collector_pipeline:
      enabled: false                                            # good in theory, but exporters need to be extremely stable

exporters:

  logging:                                                      # useful to know collector is working. DEBUG very verbose
    loglevel: debug
    sampling_initial: 5
    sampling_thereafter: 200

  jaeger:
    endpoint: "jldp-jaeger-collector-headless:14250"
    tls:
      insecure: true

  otlp/tempo:
    endpoint: "tempo-distributor.platform-tracing:4317"
    tls:
      insecure: true                                            # in-cluster non-mesh

  otlp/elastic:
    endpoint: "https://jldp-tracing-apm-http.platform-logging:8200"

service:

  extensions: [zpages, memory_ballast, health_check]

  pipelines:
    traces:
      receivers: [otlp, opencensus, jaeger]
      processors: [memory_limiter, k8sattributes, batch]        # order matters - followed recommendation
      exporters: [logging, otlp/elastic, otlp/tempo, jaeger]
    metrics:
      receivers: [otlp, opencensus]
      processors: [memory_limiter, batch]
      exporters: [logging, otlp/elastic, otlp/tempo]

Environment Kubernetes 1.20

Additional context Deploying otel collector into kubernetes as a Deployment, we enabled debug logging using logging exporter and see logs which shows the hoff.api.platform-apps in there

otel-collector-6cdd7bdc8c-nvs9z otel-collector ScopeSpans #0
otel-collector-6cdd7bdc8c-nvs9z otel-collector ScopeSpans SchemaURL: 
otel-collector-6cdd7bdc8c-nvs9z otel-collector InstrumentationScope go.opentelemetry.io/contrib/instrumentation/github.com/gorilla/mux/otelmux semver:0.32.0
otel-collector-6cdd7bdc8c-nvs9z otel-collector Span #0
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Trace ID       : a993bce18a14cc358aabfcc82cd3fee6
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Parent ID      : 
otel-collector-6cdd7bdc8c-nvs9z otel-collector     ID             : 3998d88d0879d6ba
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Name           : /healthz
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Kind           : SPAN_KIND_SERVER
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Start time     : 2022-06-24 15:39:18.760030871 +0000 UTC
otel-collector-6cdd7bdc8c-nvs9z otel-collector     End time       : 2022-06-24 15:39:18.760064667 +0000 UTC
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Status code    : STATUS_CODE_UNSET
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Status message : 
otel-collector-6cdd7bdc8c-nvs9z otel-collector Attributes:
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> net.transport: STRING(ip_tcp)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> net.peer.ip: STRING(127.0.0.6)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> net.peer.port: INT(43231)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> net.host.ip: STRING(10.20.9.185)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> net.host.port: INT(8080)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.method: STRING(GET)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.target: STRING(/healthz)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.server_name: STRING(hoff-api)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.route: STRING(/healthz)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.user_agent: STRING(kube-probe/1.20+)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.scheme: STRING(http)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.host: STRING(10.20.9.185:8080)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.flavor: STRING(1.1)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.status_code: INT(200)
otel-collector-6cdd7bdc8c-nvs9z otel-collector Span #1
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Trace ID       : 21fa4241682b57ec5afd30d5aa8d0028
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Parent ID      : 
otel-collector-6cdd7bdc8c-nvs9z otel-collector     ID             : 353e6739c0e85eee
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Name           : /healthz
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Kind           : SPAN_KIND_SERVER
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Start time     : 2022-06-24 15:39:19.829595462 +0000 UTC
otel-collector-6cdd7bdc8c-nvs9z otel-collector     End time       : 2022-06-24 15:39:19.82962522 +0000 UTC
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Status code    : STATUS_CODE_UNSET
otel-collector-6cdd7bdc8c-nvs9z otel-collector     Status message : 
otel-collector-6cdd7bdc8c-nvs9z otel-collector Attributes:
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> net.transport: STRING(ip_tcp)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> net.peer.ip: STRING(127.0.0.6)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> net.peer.port: INT(42073)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> net.host.ip: STRING(10.20.9.185)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> net.host.port: INT(8080)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.method: STRING(GET)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.target: STRING(/healthz)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.server_name: STRING(hoff-api)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.route: STRING(/healthz)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.user_agent: STRING(kube-probe/1.20+)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.scheme: STRING(http)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.host: STRING(10.20.9.185:8080)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.flavor: STRING(1.1)
otel-collector-6cdd7bdc8c-nvs9z otel-collector      -> http.status_code: INT(200)
otel-collector-6cdd7bdc8c-nvs9z otel-collector 
dmitryax commented 2 years ago

Hi @boostchicken, do you have a chance to take a look?

dmitryax commented 2 years ago

cc @pmm-sumo

boostchicken commented 2 years ago

Sure do, thanks for the heads up @dmitryax

boostchicken commented 2 years ago

Weird all the unit tests are passing, this will take some digging, I will have to write some integration tests for this one

boostchicken commented 2 years ago

Hey @gidesh I dont see a reference to hoff.api.platform-apps in your logging output? Is it a tag (service.name)? or where is it set on your Span?

yk-47 commented 2 years ago

@boostchicken I'm working with @gidesh on this. Not sure why the service.name isn't there maybe we didnt set it on our app.

We orignally tried dropping the span for all /healthz urls but that was also unsucessful.

processors:
  filter:
    spans:
      exclude:
        match_type: regex
        span_names:
          - /healthz
yk-47 commented 2 years ago

We did set it with env var OTEL_SERVICE_NAME: "hoff-api.platform-apps"

boostchicken commented 2 years ago

@boostchicken I'm working with @gidesh on this. Not sure why the service.name isn't there maybe we didnt set it on our app.

We orignally tried dropping the span for all /healthz urls but that was also unsucessful.

processors:
  filter:
    spans:
      exclude:
        match_type: regex
        span_names:
          - /healthz

This wont work, its not a regex, please change it to strict.

boostchicken commented 2 years ago

We did set it with env var OTEL_SERVICE_NAME: "hoff-api.platform-apps"

Please make sure it is working, the code looks for a service.name attribute to drop it, I don't see one here, I can't drop it if the metadata is not in the span.

yk-47 commented 2 years ago

This wont work, its not a regex, please change it to strict.

Changing it to the following made no difference.

processors:
  filter:
    spans:
      exclude:
        match_type: strict
        span_names:
          - /healthz
pmm-sumo commented 2 years ago

@gidesh your traces pipeline is missing the processor; you have:

processors: [memory_limiter, k8sattributes, batch]

While it should have been:

processors: [memory_limiter, k8sattributes, filter, batch]
gidesh commented 2 years ago

Thanks @pmm-sumo, apologies we missed that part :man_facepalming:

gidesh commented 2 years ago

Happy to close the issue, as it was a mistake from our side