open-telemetry / opentelemetry-js

OpenTelemetry JavaScript Client
https://opentelemetry.io
Apache License 2.0
2.69k stars 780 forks source link

Add suport to flush via API #3310

Open Flarna opened 3 years ago

Flarna commented 3 years ago

The AWS Lambda Plugin as the requirement to flush data at the time the invocation of the function is done. See https://github.com/open-telemetry/opentelemetry-js-contrib/blob/b128dae70cf5723162ef72ed0ed83924f3edf4dc/plugins/node/opentelemetry-instrumentation-aws-lambda/src/aws-lambda.ts#L199-L206

As once can see this creates a binding from instrumentation to a specific SDK. Usually an instrumentation should have no need to require SDK components.

Currently API offers no way to get the TracerProvider used (it returns a ProxyTracerProvider) nor does it offer a way to force flush the trace data.

Should we enhance the API with some functionality regarding this?

dyladan commented 3 years ago

This is a recurring spec question. AFAIK it has been shot down every time. Maybe a member of the @open-telemetry/technical-committee can weigh in?

tigrannajaryan commented 3 years ago

@dyladan I think it is a reasonable request to have a Flush API. What were the arguments against? Perhaps time to submit a proposal to the spec? That will be a backward-compatible addition to the API. Exporters are already supposed to have a Flush API, so it is a matter exposing it in the user-facing API.

yurishkuro commented 3 years ago

I am not clear how this is not addressed by the ForceFlush that is already in the spec? https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#forceflush

Flarna commented 3 years ago

@yurishkuro It's specified as part of the SDK but an instrumentation should usually have no need to access the SDK, they should only need the API. As of now having the requirement to access the SDK even results in a direct binding to a specific SDK implementation. So one of the OTel goals to allow custom SDKs is also broken.

I see following possibilities to solve this (thinking only of JS here as we are in JS repo) on API level (but maybe there are more)

  1. exposing TracerProvider#forcedFlush on API
  2. adapt API that api.trace.getTracerProvider() returns the concrete TracerProvider instance installed instead a ProxyTracerProvider. Users can then do a instanceof or typeof(provider.getSpanProcessor) === 'function'check (would always fail on ProxyTracerProvider.

(1) is quite generic but will most likely result in followups to add more and more APIs which is not my intention here (2) sounds quite reasonable as I would in general expect that api.trace.getTracerProvider() returns what I passed to api.tracer.setGlobalTracerProvider() (assuming success here)

Another option is clearly to ignore this and let an instrumentation use SDK APIs as needed and just document hat they are not fully compatible with 3rd party SDKs.

yurishkuro commented 3 years ago

How is the SDK installed in the first place? The default impl in the API is no-op, so there needs to be some code that is aware of the SDK, and it's usually that code that's responsible for flushing.

dyladan commented 3 years ago

The instrumentation code currently only accesses the API directly and does not use any sdk-specific methods. This allows a third party SDK implementation to use the instrumentations we provide without having to fully copy all of our internal SDK implementation details.

Flarna commented 3 years ago

Javascript is dynamically typed and there is nothing like a build step to link together to an app. It's quite easy to create an app having an SDK and some instrumentations as direct dependency and one of these instrumentions may decide to add parts of the default SDK as transitive dependency (maybe the same as the direct dependency, maybe a different one, maybe just version differs).

If API offers all needed functionality for instrumentations the API needs to match. This is by far easier/stable done.

secustor commented 2 years ago

Is there any movement on this topic? I have ran into this too as I'm loosing spans if using the batchprocessor.

The stop-gap solution for me has been this snippet, but it is far from ideal:

  const traceProvider = api.trace.getTracerProvider();
  if (traceProvider instanceof NodeTracerProvider) {
    await traceProvider.shutdown();
  } else if (traceProvider instanceof ProxyTracerProvider) {
    const delegateProvider = traceProvider.getDelegate()
    if (delegateProvider instanceof NodeTracerProvider) {
      await delegateProvider.shutdown()
    }
  }
pksunkara commented 1 week ago

Shouldn't this API be added to the SDK itself? Right now, when we shut down the SDK, I am losing traces, logs, and metrics.

process.on('SIGTERM', () => {
  sdk
    .shutdown()
    .then(
      () => console.log('Telemetry shut down successfully'),
      () => console.error('Telemetry shut down failed'),
    )
    .finally(() => process.exit(0));
});

Shouldn't there be an sdk.forceFlush() that we should be calling before that?