open-telemetry / opentelemetry-js

OpenTelemetry JavaScript Client
https://opentelemetry.io
Apache License 2.0
2.72k stars 793 forks source link

propagator-aws-xray broken with GRPC? #4830

Open oceddi opened 4 months ago

oceddi commented 4 months ago

What happened?

Steps to Reproduce

Send a gRPC message using AWSXRay propagation from a client to a server. You will see that the 'x-amzn-trace-id' header is not getting parsed correctly by the AXS X-ray propagator during extraction. This results in spans created on the server not getting linked/made children of the originating calling span in the gRPC handlers.

Expected Result

The gRPC handlers on the server side should be linked/parented by the caller.

Actual Result

No linkage is occurring.

Additional Details

I think this bug is caused by the following code in the AWS X-Ray Propagator: https://github.com/open-telemetry/opentelemetry-js/blob/main/packages/propagator-aws-xray/src/AWSXRayPropagator.ts#L102

The code fetches the 'x-amzn-trace-id' from the carrier (the HTTP headers). The gRPC instrumentation provides these as key values where the value is always wrapped in an array (with the first element containing the key). The code in the AWS propagator doesn't account for that and rejects the header if it isn't of type string.

I looked at the other two propagators (B3 and Jaeger) and they seem to properly handle this situation by checking if the value returned from getter.get is an array and pulling element zero from it:

https://github.com/open-telemetry/opentelemetry-js/blob/main/packages/opentelemetry-propagator-b3/src/B3Propagator.ts#L67

https://github.com/open-telemetry/opentelemetry-js/blob/main/packages/opentelemetry-propagator-jaeger/src/JaegerPropagator.ts#L94

Someone needs to update the AWS Xray propagator to do the same check.

OpenTelemetry Setup Code

import process from 'process';
import { Resource } from "@opentelemetry/resources";
import { SEMRESATTRS_SERVICE_NAME } from "@opentelemetry/semantic-conventions";
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { AWSXRayPropagator } from "@opentelemetry/propagator-aws-xray";
import { AWSXRayIdGenerator } from "@opentelemetry/id-generator-aws-xray";
import { GrpcInstrumentation } from '@opentelemetry/instrumentation-grpc';
import { ConsoleMetricExporter, PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { NodeSDK, api } from '@opentelemetry/sdk-node';
import { AsyncHooksContextManager } from '@opentelemetry/context-async-hooks';

const propagator = new AWSXRayPropagator();
const contextManager = new AsyncHooksContextManager();
contextManager.enable();
api.context.setGlobalContextManager(contextManager);
api.propagation.setGlobalPropagator(propagator);

const resource = Resource.default().merge(new Resource({
  [SEMRESATTRS_SERVICE_NAME]: "myservice",
}));

const traceExporter = new OTLPTraceExporter();
const spanProcessor = new BatchSpanProcessor(traceExporter);

const sdk = new NodeSDK({
  autoDetectResources: true,
  resource,
  idGenerator: new AWSXRayIdGenerator(),
  textMapPropagator: propagator,
  traceExporter,
  spanProcessors: [spanProcessor],
  metricReader: new PeriodicExportingMetricReader({
    exporter: new ConsoleMetricExporter(),
  }),
  instrumentations: [
    new GrpcInstrumentation(),
  ],
});

sdk.start();

const shutdown = () => sdk.shutdown()
  .then(() => console.log('Tracing and Metrics terminated'))
  .catch((error) => console.log('Error terminating tracing and metrics', error))
  .finally(() => process.exit(0));

process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);

package.json

"dependencies": {
    "@grpc/grpc-js": "^1.10.9",
    "@grpc/proto-loader": "^0.7.13",
    "@opentelemetry/api": "^1.9.0",
    "@opentelemetry/auto-instrumentations-node": "^0.47.1",
    "@opentelemetry/id-generator-aws-xray": "^1.2.2",
    "@opentelemetry/instrumentation-grpc": "^0.52.1",
    "@opentelemetry/instrumentation-ioredis": "^0.41.0",
    "@opentelemetry/propagator-aws-xray": "^1.25.1",
    "@opentelemetry/resources": "^1.25.1",
    "@opentelemetry/sdk-metrics": "^1.25.1",
    "@opentelemetry/sdk-node": "^0.52.1",
    "@opentelemetry/sdk-trace-node": "^1.25.1",
    "@opentelemetry/semantic-conventions": "^1.25.1",
    "dotenv": "^16.4.5",
    "ioredis": "^5.4.1"
  }

Relevant log output

No response

mbrevda commented 3 months ago

Seeing the same for GCPs x-cloud-trace-context after upgrading to the latest versions

mbrevda commented 3 months ago

rolling back @opentelemetry/sdk-trace-node to v1.24.1 seems to resolve the issue (although that causes a dependency discrepancy with @opentelemetry/api, which should be rolled back to 1.8.0 as well).

mbrevda commented 1 month ago

any update on this? Not sure if it's 100% the same issue, but I'm also not seeing gcp's X-Cloud-Trace-Context being propagated, making it impossible to upgrade the packages to their latest versions

mbrevda commented 1 month ago

any update on this? Not sure if it's 100% the same issue, but I'm also not seeing gcp's X-Cloud-Trace-Context being propagated, making it impossible to upgrade the packages to their latest versions

The issue I was having was causes by some weirdness in package-lock.json. Recreating the lock file resolved the issue