open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.89k stars 2.26k forks source link

[exporter/sapm] Token pass-through not working #33126

Open strk1204 opened 3 months ago

strk1204 commented 3 months ago

Component(s)

exporter/sapm

What happened?

Description.

When using the following setup, tokens provided via the downstream agents, over OTLP, do not get passed on to SOC(SignalFX), resulting in a 401 being returned at the gateway/collector level.

Example Setup:

Java Application -> Java (Splunk Dist) OTel Agent -> (Splunk Dist) OTel Collector -> SplunkOC

The Java OTel agent is configured with the collector as both the SignalFX (for metrics) and OTLP (for traces) endpoints. A valid token is provided, and in this configuration, metrics are being gathered from the application successfully. The collector is configured with the correct setup to forward this data to SOC. The collector/exporter level token, is "NULL", with access_token_passthrough enabled. include_metadata: true has also been enabled for the OTLP receiver, to ensure that all values are being retained before passing to the SAPM exporter. It is expected that, as there is a valid token in the payload, it will use this instead of the exporters token, as other exporters behave.

      sapm:
        access_token: "NULL"
        access_token_passthrough: true
        disable_compression: false # Force enable GZIP
        endpoint: "https://ingest.${SPLUNK_O11Y_REALM}.signalfx.com/v2/trace"
        sending_queue:
          num_consumers: 32

I can confirm that the token IS being passed to the collector, via the "X-SF-TOKEN" header, however this is not being used by the SAPM exporter.

As a work around, I've added the following config, which has enabled this setup to work correctly.

        actions:
          - key: "com.splunk.signalfx.access_token"
            from_context: "x-sf-token"
            action: insert
      transform/retain-sfx-token:
        error_mode: silent
        trace_statements:
          - context: span
            statements:
              - set(resource.attributes["com.splunk.signalfx.access_token"], attributes["com.splunk.signalfx.access_token"])
              - delete_key(attributes, "com.splunk.signalfx.access_token")

I also can't use OTLP -> OTLPHTTP due to headers not carrying over, and the lack of the headers_setter ext in the Splunk dist of the collector; IDEA

I'm unsure if here, or the agent level itself is the best place to raise this, however given that it appears that the agents are exporting the tokens as expected, in-line with the Splunk provided documentation, the fault lies in the handling of the token, within the headers, by the SAPM exporter.

Steps to Reproduce

  1. Deploy example application with OTel agent
  2. Export OTLP metrics, with valid token, to a collector, configured as outlined above.
  3. Check logs
  4. Export fail.

Expected Result

Token passed through correctly to SOC, auth completed and traces ingested.

Actual Result

401 returns on traces.

P.S, all references to agent above, refer to an application agent, NOT a collector deployed in, what Splunk coin, "agent mode".

Collector version

0.100.0

Environment information

Environment

Splunk Distribution - OTel Collector - 0.100.0 - Deployed via Helm to Kubernetes. Configuration as outlined.

OpenTelemetry Collector configuration

...
    receivers:
      otlp: # Metrics, Traces, Logs
        protocols:
          http:
            endpoint: "${SPLUNK_LISTEN_INTERFACE}:4318"
            include_metadata: true
      sapm: # Metrics & Traces
        endpoint: "${SPLUNK_LISTEN_INTERFACE}:7276"
        access_token_passthrough: true
...
    processors:
      attributes/retain-sfx-token:
        actions:
          - key: "com.splunk.signalfx.access_token"
            from_context: "x-sf-token"
            action: insert
      transform/retain-sfx-token:
        error_mode: silent
        trace_statements:
          - context: span
            statements:
              - set(resource.attributes["com.splunk.signalfx.access_token"], attributes["com.splunk.signalfx.access_token"])
              - delete_key(attributes, "com.splunk.signalfx.access_token")
...
    exporters:
      sapm:
        access_token: "NULL"
        access_token_passthrough: true
        disable_compression: false # Force enable GZIP
        endpoint: "https://ingest.${SPLUNK_O11Y_REALM}.signalfx.com/v2/trace"
        sending_queue:
          num_consumers: 32
...
pipelines:
        traces:
          receivers: [sapm, otlp]
          processors:
            - memory_limiter
            # - batch # Disable due to breaking this fix
            ## The following are required to handle OTLP Tokens
            - attributes/retain-sfx-token
            - transform/retain-sfx-token
          exporters: [sapm]
...

Log output

No response

Additional context

No response

github-actions[bot] commented 3 months ago

Pinging code owners:

atoulme commented 3 months ago

I have a working example here that is a bit simpler since we use the resource processor. Here is the docker-compose.yml to generate this:

version: "3"
services:
  telemetrygen:
    image: ghcr.io/open-telemetry/opentelemetry-collector-contrib/telemetrygen:latest
    container_name: telemetrygen
    command:
      - "traces"
      - "--otlp-endpoint"
      - "otelcollector:4317"
      - "--otlp-insecure"
      - "--duration"
      - "10m"
      - "--rate"
      - "1000"
      - "--workers"
      - "3"
      - "--otlp-attributes"
      - 'deployment.environment="testpassthrough"'
      - "--otlp-header"
      - Content-Type="application/x-protobuf"
      - "--otlp-header"
      - X-SF-Token="<REPLACE WITH YOUR TOKEN>"
    depends_on:
      - otelcollector
  # OpenTelemetry Collector
  otelcollector:
    image:  quay.io/signalfx/splunk-otel-collector:latest
    container_name: otelcollector
    command: ["--config=/etc/otel-collector-config.yml", "--set=service.telemetry.logs.level=debug"]
    volumes:
      - ./otel-collector-config.yml:/etc/otel-collector-config.yml

Here is the collector configuration:

receivers:
    otlp:
      protocols:
        http:
          include_metadata: true
          endpoint: 0.0.0.0:4318
        grpc:
          include_metadata: true
          endpoint: 0.0.0.0:4317

exporters:
    sapm:
      access_token: "NULL"
      access_token_passthrough: true
      disable_compression: false # Force enable GZIP
      endpoint: "https://ingest.us0.signalfx.com/v2/trace"
      sending_queue:
        num_consumers: 32

processors:
    batch:
    resource:
      attributes:
        - action: insert
          from_context: X-SF-Token
          key: com.splunk.signalfx.access_token

extensions:
    health_check:
      endpoint: 0.0.0.0:13133
    pprof:
      endpoint: :1888
    zpages:
      endpoint: :55679

service:
    extensions: [pprof, zpages, health_check]
    pipelines:
      traces:
        receivers: [otlp]
        processors: [resource]
        exporters: [sapm]

From what I understand, you expect the collector to automatically extract the token from the request headers and set if on the response. That's not possible out of the box. You need to do the extraction and mapping to a resource attribute yourself.

strk1204 commented 3 months ago

Yes that's correct, that was the underlying expectation here.

Is there any plan on adding this functionality to the SAPM exporter?

In-light of posts relating to the possible deprecation of SAPM in favour of OTLP, assuming we retain include_metadata, can we expect traces OTLP -> OTLP to retain X-SF-TOKEN, and this be accepted by the appropriate OTLP endpoint on the signalfx platform? Thus this would remove the need for this work around, should the behavior work as expected.

Thanks!

atoulme commented 1 month ago

No, the same approach is taken. We cannot pass around tokens implicitly or we might be disclosing secrets.