open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.75k stars 2.19k forks source link

awsxray exporter can't handle certain database/sql attributes #16178

Closed felixscheinost closed 10 months ago

felixscheinost commented 1 year ago

Component(s)

exporter/awsxray

What happened?

Description

Some spans which have certain values for the db.* attributes don't get displayed on AWS.

Our setup is: AWS OTel Java agent -> AWS OTel collector -> AWS X-Ray

Steps to Reproduce

I exported the following attributes from a locally running Jaeger instance. This is a span which doesn't show up in production on AWS.

attribute value note
otel.library.name io.opentelemetry.jdbc
otel.library.version 1.18.0-alpha
thread.name http-nio-auto-1-exec-6
db.name offers?database_to_upper=false this doesn't look right but should still show up on CloudWatch regardless if possible
db.sql.table OFFER
db.operation SELECT
db.statement <omitted, quite long> I can provide this if necessary but I don't think it is necessary to reproduce the bug, see my code example below
db.system h2
db.connection_string h2:mem:
thread.id 149

I also tried constructing a minimal span which doesn't show up on AWS as well.

    val tracer: Tracer = GlobalOpenTelemetry.getTracer("instrumentation-library-name", "1.0.0")
    val span = tracer.spanBuilder("SELECT offers?database_to_upper=false.OFFER").startSpan()
    span.setAttribute("db.name", "offers")
    // The following line makes the span completely disappear from AWS CloudWatch
    // span.setAttribute("db.name", "offers?")
    try {
      span.makeCurrent()
    } finally {
      span.end()
    }

So it seems that just db.name containing a ? breaks something.

Expected Result

The span should show up on CloudWatch even if the name, etc aren't ideal.

Actual Result

The span is nowwhere to be found on CloudWatch not even when viewing the raw JSON of the trace.

Collector version

0.58.0, v0.21.0 of the AWS repacked docker image

Environment information

Environment

Running on Fargate.

Debugged locally on macOS. Could reproduce there (see code example above).

OpenTelemetry Collector configuration

exporters:
  awsxray: {}
extensions:
  awsproxy: {}
  health_check: {}
  memory_ballast:
    size_mib: 64
processors:
  batch/metrics:
    timeout: 60s
  batch/traces:
    send_batch_size: 50
    timeout: 1s
  memory_limiter:
    check_interval: 5s
    limit_mib: 400
    spike_limit_mib: 100
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
service:
  extensions:
  - health_check
  - memory_ballast
  - awsproxy
  pipelines:
    traces:
      exporters:
      - awsxray
      processors:
      - memory_limiter
      - batch/traces
      receivers:
      - otlp

Log output

No response

Additional context

I can not certainly say that the problem is with the awsxray exporter. The problem could be on AWS side as well. I don't know how to verify that. Could I increase log verbosity?

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

felixscheinost commented 1 year ago

I activated debug logging in the collector and extracted the following from the logs:

The segment that is created for my test code above is:

{
  "name": "offers?database_to_upper=false",
  "id": "e45057d3922e4b21",
  "start_time": 1667973149.4018862,
  "trace_id": "1-636b401d-13dc58580b87fdf0f7a45dab",
  "end_time": 1667973149.4270935,
  "fault": false,
  "error": false,
  "throttle": false,
  "aws": {
    "xray": {
      "sdk": "opentelemetry for java",
      "sdk_version": "1.18.0",
      "auto_instrumentation": true
    }
  },
  "metadata": {
    "default": {
      "db.name": "offers?database_to_upper=false",
      "thread.id": 146,
      "thread.name": "http-nio-auto-1-exec-1"
    }
  },
  "parent_id": "32ad79d57d8868ff",
  "type": "subsegment"
}

The response from X-Ray then is:

    debug   awsxrayexporter@v0.58.0/awsxray.go:75   response: {
  UnprocessedTraceSegments: [{
      ErrorCode: "InvalidName",
      Id: "e45057d3922e4b21",
      Message: "Invalid subsegment. ErrorCode: InvalidName, Cause: null"
    }]
}   {"kind": "exporter", "data_type": "traces", "name": "awsxray"}

Looking into the docs, this makes sense: https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-fields

name – The logical name of the service that handled the request, up to 200 characters. For example, your application's name or domain name. Names can contain Unicode letters, numbers, and whitespace, and the following symbols: _, ., :, /, %, &, #, =, +, \, -, @

felixscheinost commented 1 year ago

So, what's weird then is that why does the following code, without db.name but with a ? in the span name work?

    val tracer: Tracer = GlobalOpenTelemetry.getTracer("instrumentation-library-name", "1.0.0")
    val span = tracer.spanBuilder("SELECT offers?database_to_upper=false.OFFER").startSpan()
    try {
      span.makeCurrent()
    } finally {
      span.end()
    }

It seems like something is automatically removing illegal characters from the span name because the span is exported to X-Ray without the ?, successfully.

Would that make sense in this case as well?

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 12 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 10 months ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.