open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.21k stars 440 forks source link

Node.js resource detectors can't be disabled with `OTEL_NODE_RESOURCE_DETECTORS` #2626

Open illrill opened 9 months ago

illrill commented 9 months ago

Component(s)

instrumentation

What happened?

Description

The Node.js auto-instrumentation includes a series of cloud provider resource detectors which cannot be toggled off. I'm running on AKS, so the instrumentation keeps attempting to retrieve metadata related to other cloud providers (for example, calling the API server on https://kubernetes.default.svc/api/v1/namespaces/kube-system/configmaps/aws-auth), but this naturally fails because the metadata doesn't exist.

The auto-instrumentations-node package exposes OTEL_NODE_RESOURCE_DETECTORS to give users control over which resource detectors to use, but here in the Operator, since all resource detectors are hardcoded when instantiating the SDK, this variable has no effect.

https://github.com/open-telemetry/opentelemetry-operator/blob/c10fe8aff3017d41f82999e354e6a705a3b2dfe7/autoinstrumentation/nodejs/src/autoinstrumentation.ts#L49-L55

Steps to Reproduce

  1. Run the operator on AKS
  2. Auto-instrument a Node.js application using instrumentation.opentelemetry.io/inject-nodejs: "my-instrument"
  3. Enable debug mode on the application with OTEL_LOG_LEVEL="debug"
  4. Set OTEL_NODE_RESOURCE_DETECTORS="env,host,os,process,container" on the application to try and exclude cloud-specific resource detectors
  5. Resource detectors are still enabled

Expected Result

OTEL_NODE_RESOURCE_DETECTORS should work as per https://github.com/open-telemetry/opentelemetry-js-contrib/blob/main/metapackages/auto-instrumentations-node/README.md#usage-auto-instrumentation

Actual Result

OTEL_NODE_RESOURCE_DETECTORS has no effect

Kubernetes Version

1.26.6

Operator version

0.92.1

Collector version

0.92.0

Environment information

Environment

AKS

Log output

a resource's async attributes promise rejected: Error: ECS metadata api request timed out.
    at Timeout._onTimeout (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/resource-detector-alibaba-cloud/build/src/detectors/AlibabaCloudEcsDetector.js:87:29)
    at listOnTimeout (internal/timers.js:555:17)
    at processTimers (internal/timers.js:498:7)
AlibabaCloudEcsDetector found resource. Resource {
  _attributes: {},
  asyncAttributesPending: false,
  _syncAttributes: {},
  _asyncAttributesPromise: Promise {
    {},
    [Symbol(async_id_symbol)]: 75029,
    [Symbol(trigger_async_id_symbol)]: 0
  }
}

a resource's async attributes promise rejected: Error: EC2 metadata api request timed out.
    at Timeout._onTimeout (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/resource-detector-aws/build/src/detectors/AwsEc2Detector.js:113:24)
    at listOnTimeout (internal/timers.js:555:17)
    at processTimers (internal/timers.js:498:7)
AwsEc2Detector found resource. Resource {
  _attributes: {},
  asyncAttributesPending: false,
  _syncAttributes: {},
  _asyncAttributesPromise: Promise {
    {},
    [Symbol(async_id_symbol)]: 75034,
    [Symbol(trigger_async_id_symbol)]: 0
  }
}

error reading machine id: Error: ENOENT: no such file or directory, open '/etc/machine-id'
GcpDetector failed: GCP Metadata unavailable.
GcpDetector found resource. Resource {
  _attributes: {},
  asyncAttributesPending: false,
  _syncAttributes: {},
  _asyncAttributesPromise: Promise {
    {},
    [Symbol(async_id_symbol)]: 75063,
    [Symbol(trigger_async_id_symbol)]: 0
  }
}
error reading machine id: Error: ENOENT: no such file or directory, open '/var/lib/dbus/machine-id'

@opentelemetry/instrumentation-http outgoingRequest on response()
@opentelemetry/instrumentation-http outgoingRequest on end()
Process is not running on K8S Error: Failed to load page, status code: 403
    at IncomingMessage.<anonymous> (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/resource-detector-aws/build/src/detectors/AwsEksDetector.js:192:32)
    at /otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/context-async-hooks/build/src/AbstractAsyncHooksContextManager.js:50:55
    at AsyncLocalStorage.run (async_hooks.js:305:14)
    at AsyncLocalStorageContextManager.with (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/context-async-hooks/build/src/AsyncLocalStorageContextManager.js:33:40)
    at IncomingMessage.contextWrapper (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/context-async-hooks/build/src/AbstractAsyncHooksContextManager.js:50:32)
    at IncomingMessage.emit (events.js:388:22)
    at endReadableNT (internal/streams/readable.js:1336:12)
    at processTicksAndRejections (internal/process/task_queues.js:82:21)
AwsEksDetector found resource. Resource {
  _attributes: {},
  asyncAttributesPending: false,
  _syncAttributes: {},
  _asyncAttributesPromise: Promise {
    {},
    [Symbol(async_id_symbol)]: 75589,
    [Symbol(trigger_async_id_symbol)]: 0
  }
}

Additional context

Originally posted in https://github.com/open-telemetry/opentelemetry-js-contrib/issues/1780

Starefossen commented 8 months ago

You should add that to the otelins resource like this as the one on the pod/container will be overwritten:

apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
spec:
  exporter:
    endpoint: http://otel-collector:4317
  ...
  nodejs:
    env:
      - name: OTEL_NODE_RESOURCE_DETECTORS
        value: env,host,os,process,container
illrill commented 8 months ago

I've tried that too, still doesn't take effect. See https://github.com/open-telemetry/opentelemetry-js-contrib/issues/1780#issuecomment-1943723681.

Starefossen commented 8 months ago

I did this in my cluster and it worked. Are you using the latest version of the operator?

aelmekeev commented 4 months ago

Stumbled upon this one today and can confirm that this is a real issue.

Modifying OTEL_NODE_RESOURCE_DETECTORS may lead to errors like this (code):

Invalid resource detector "container" specified in the environment variable OTEL_NODE_RESOURCE_DETECTORS
Invalid resource detector "aws" specified in the environment variable OTEL_NODE_RESOURCE_DETECTORS

Since SDK will check OTEL_NODE_RESOURCE_DETECTORS but not the operator as mentioned by the reporter.

hien-prio commented 2 weeks ago

Is this as simple as changing the affected lines to use the env util?

const sdk = new NodeSDK({
    autoDetectResources: true,
    instrumentations: [getNodeAutoInstrumentations()],
    traceExporter: new OTLPTraceExporter(),
    metricReader: getMetricReader(),
    resourceDetectors: getResourceDetectorsFromEnv()
});