open-telemetry / opentelemetry-js

OpenTelemetry JavaScript Client
https://opentelemetry.io
Apache License 2.0
2.65k stars 765 forks source link

OTLPExporterBase causes unhandled rejection on DNS resolution error #4946

Open andsens opened 3 weeks ago

andsens commented 3 weeks ago

What happened?

Steps to Reproduce

Without an opentelemetry agent running provide OTEL_EXPORTER_OTLP_ENDPOINT="http://opentelemetry-agent:4317" through the environment and register a trace provider.

I believe you will need to trace something as well, though the stacktrace is not very heavy on the details, so that's a guess.

Expected Result

The instrumentation logs an error about DNS resolution failing.

Actual Result

The instrumentation rejects with an unhandled Promise about DNS resolution failing.

Additional Details

OpenTelemetry Setup Code

const resource = otelResources.Resource.default().merge(
    new otelResources.Resource({
        [otelSemanticConventions.SEMRESATTRS_SERVICE_NAME]: serviceName,
        [otelSemanticConventions.SEMRESATTRS_SERVICE_VERSION]: serviceVersion,
        ...options.defaultAttributes,
    }),
);
const tracerProvider = new otelSDKTraceNode.NodeTracerProvider({ resource });
tracerProvider.addSpanProcessor(
    new otelSDKTraceBase.BatchSpanProcessor(new otelExportTraceOTLPGRPC.OTLPTraceExporter()),
);
tracerProvider.register();

package.json

{
  "dependencies": {
    "@rushstack/eslint-patch": "1.7.2",
    "@opentelemetry/api": "1.9.0",
    "@opentelemetry/exporter-trace-otlp-grpc": "0.52.1",
    "@opentelemetry/instrumentation": "0.52.1",
    "@opentelemetry/resources": "1.25.1",
    "@opentelemetry/sdk-trace-node": "1.25.1",
    "@opentelemetry/semantic-conventions": "1.25.1"
  }
}

Relevant log output

Error: 14 UNAVAILABLE: Name resolution failed for target dns:opentelemetry-agent:4317
    at callErrorFromStatus (/service/node_modules/@grpc/grpc-js/src/call.ts:82:17)
    at Object.onReceiveStatus (/service/node_modules/@grpc/grpc-js/src/client.ts:360:55)
    at Object.onReceiveStatus (/service/node_modules/@grpc/grpc-js/src/client-interceptors.ts:458:34)
    at Object.onReceiveStatus (/service/node_modules/@grpc/grpc-js/src/client-interceptors.ts:419:48)
    at /service/node_modules/@grpc/grpc-js/src/resolving-call.ts:132:24
    at processTicksAndRejections (node:internal/process/task_queues:77:11)
for call at
    at ServiceClientImpl.makeUnaryRequest (/service/node_modules/@grpc/grpc-js/src/client.ts:325:42)
    at ServiceClientImpl.export (/service/node_modules/@grpc/grpc-js/src/make-client.ts:189:15)
    at /service/node_modules/@opentelemetry/otlp-grpc-exporter-base/src/grpc-exporter-transport.ts:160:26
    at new Promise (<anonymous>)
    at GrpcExporterTransport.send (/service/node_modules/@opentelemetry/otlp-grpc-exporter-base/src/grpc-exporter-transport.ts:145:12)
    at OTLPTraceExporter.send (/service/node_modules/@opentelemetry/otlp-grpc-exporter-base/src/OTLPGRPCExporterNodeBase.ts:129:37)
    at /service/node_modules/@opentelemetry/otlp-exporter-base/src/OTLPExporterBase.ts:105:14
    at new Promise (<anonymous>)
    at OTLPTraceExporter._export (/service/node_modules/@opentelemetry/otlp-exporter-base/src/OTLPExporterBase.ts:102:12)
    at OTLPTraceExporter.export (/service/node_modules/@opentelemetry/otlp-exporter-base/src/OTLPExporterBase.ts:92:10)
pichlermarc commented 3 weeks ago

Hi @andsens, thanks for reaching out.

Is this happening when shutting down or force flush the TracerProvider or while the app is running? :thinking:

I tested this locally and was only able to repro this when I called TracerProvider#shutdown() without considereing rejections. The stack trace was the same as you received, and it also did not mention anything about shutdown() or forceFlush(). During normal operation (scheduled exports) the rejection was properly handled.

andsens commented 3 weeks ago

Oh yeah, that might be it. I have a shutdown procedure with a function that looks like this:

export async function closeTracer(context: RootLogContext) {
    if (isTracerContext(context)) {
        const { rootLog, tracerProvider } = context;
        rootLog.verbose('shutdown: close tracer connection');
        tracerProvider.shutdown();
        const [timeout, cancelTimout] = throwAfter(500, 'shutdown: Timed out waiting for tracer to close.');
        await Promise.race([timeout, tracerProvider.shutdown]);
        cancelTimout();
    }
}

The function call itself is wrapped in a try/catch, so the error should simply be logged. The error is from a short running script, so the tracerProvider.shutdown() call is likely to get the responsibility of having to flush everything, since no timers are likely to be triggered.