open-telemetry / opentelemetry-operator

Kubernetes Operator for OpenTelemetry Collector
Apache License 2.0
1.16k stars 414 forks source link

[auto-instrumentation-python]Having issue with auto-instrumentation python of OpenTelemetry operator #2409

Open xwgao opened 9 months ago

xwgao commented 9 months ago

Component(s)

instrumentation

What happened?

Description

I installed Community OpenTelemetry Operator 0.89.0 in my OpenShift 4.12.22 cluster. I created an OpenTelemetry instrumentation and collector in my namespace. And add annotation instrumentation.opentelemetry.io/inject-python: "instrumentation" in my deployment (which uses Python technology) in the namespace. Then after the pod restarted, I found the error messages below from the pod log.

 from sqlite3.dbapi2 import *
File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
Failed to auto initialize opentelemetry
Traceback (most recent call last):
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/sitecustomize.py", line 39, in initialize
_load_instrumentors(distro)
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 91, in _load_instrumentors
raise exc
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
distro.load_instrumentor(entry_point, skip_dep_check=True)
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
instrumentor: BaseInstrumentor = entry_point.load()
File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2518, in load
return self.resolve()
File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2524, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/sqlite3/__init__.py", line 42, in <module>
import sqlite3
File "/usr/local/lib/python3.9/sqlite3/__init__.py", line 57, in <module>
from sqlite3.dbapi2 import *
File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
...
Failed to export batch code: 404, reason: 404 page not found
...

Steps to Reproduce

  1. Install Community OpenTelemetry Operator 0.89.0 in OpenShift 4.12.22 cluster.
  2. In my namespace, create an OpenTelemetry instrumentation as below.
    apiVersion: opentelemetry.io/v1alpha1
    kind: Instrumentation
    metadata:
    annotations:
    instrumentation.opentelemetry.io/default-auto-instrumentation-apache-httpd-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3
    instrumentation.opentelemetry.io/default-auto-instrumentation-dotnet-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:1.1.0
    instrumentation.opentelemetry.io/default-auto-instrumentation-go-image: >-
      ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.8.0-alpha
    instrumentation.opentelemetry.io/default-auto-instrumentation-java-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.31.0
    instrumentation.opentelemetry.io/default-auto-instrumentation-nginx-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3
    instrumentation.opentelemetry.io/default-auto-instrumentation-nodejs-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.44.0
    instrumentation.opentelemetry.io/default-auto-instrumentation-python-image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.41b0
    name: instrumentation
    namespace: my-namespace
    labels:
    app.kubernetes.io/managed-by: opentelemetry-operator
    spec:
    exporter:
    endpoint: 'http://otel-collector-headless:4317'
    java:
    env:
      - name: OTEL_INSTRUMENTATION_LIBERTY_ENABLED
        value: 'true'
      - name: OTEL_METRICS_EXPORTER
        value: none
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.31.0
    resources:
      limits:
        cpu: 500m
        memory: 64Mi
      requests:
        cpu: 50m
        memory: 64Mi
    sampler:
    argument: '1'
    type: parentbased_traceidratio
    go:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.8.0-alpha
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 32Mi
      requests:
        cpu: 50m
        memory: 32Mi
    nodejs:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.44.0
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 50m
        memory: 128Mi
    resource: {}
    apacheHttpd:
    configPath: /usr/local/apache2/conf
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 1m
        memory: 128Mi
    version: '2.4'
    propagators:
    - tracecontext
    - baggage
    - b3
    dotnet:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:1.1.0
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 50m
        memory: 128Mi
    nginx:
    configFile: /etc/nginx/nginx.conf
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.3
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 128Mi
      requests:
        cpu: 1m
        memory: 128Mi
    python:
    env:
      - name: OTEL_EXPORTER_OTLP_ENDPOINT
        value: 'http://otel-collector-headless:4318'
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.41b0
    resourceRequirements:
      limits:
        cpu: 500m
        memory: 32Mi
      requests:
        cpu: 50m
        memory: 32Mi
  3. Create an OpenTelemetry collector as below.

    apiVersion: opentelemetry.io/v1alpha1
    kind: OpenTelemetryCollector
    metadata:
    labels:
    app.kubernetes.io/managed-by: opentelemetry-operator
    name: otel
    namespace: my-namespace
    spec:
    observability:
    metrics: {}
    config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    
    processors:
      batch:
        timeout: 10s
        send_batch_size: 10000
      metricstransform:
        transforms:
          - include: my-test.duration
            match_type: regexp
            action: update
            operations:
              - action: update_label
                label: http.url
                new_label: url
              - action: update_label
                label: http.method
                new_label: method
              - action: update_label
                label: http.status_code
                new_label: code
    
    exporters:
      logging:
        verbosity: detailed
      prometheus:
        endpoint: "0.0.0.0:8889"
        send_timestamps: true
        metric_expiration: 1440m
    
    connectors:
      spanmetrics:
        namespace: my-test
        histogram:
          unit: s
          explicit:
            buckets: [10ms, 100ms, 200ms, 400ms, 800ms, 1s, 1200ms, 1400ms, 1600ms, 1800ms, 2s, 4s, 6s, 8s, 10s]
        dimensions:
          - name: http.method
          - name: http.status_code
          - name: http.url
          - name: http.route
          - name: http.host
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch]
          exporters: [spanmetrics, logging]
        metrics:
          receivers: [spanmetrics]
          processors: [batch, metricstransform]
          exporters: [prometheus, logging]
    mode: statefulset
    resources: {}
    managementState: managed
    upgradeStrategy: automatic
    ingress:
    route: {}
    targetAllocator:
    prometheusCR:
      scrapeInterval: 30s
    resources: {}
    image: >-
    ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.89.0
    replicas: 1
    updateStrategy: {}
    podDisruptionBudget:
    maxUnavailable: 1
  4. Add below annotation into my deployment (which uses Python technology) in the same namespace. Save the changes, then the pod restarted.
        instrumentation.opentelemetry.io/inject-python: instrumentation
  5. After the pod restarted, I found that error messages below from the pod log.
    from sqlite3.dbapi2 import *
    File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
    from _sqlite3 import *
    ModuleNotFoundError: No module named '_sqlite3'
    Failed to auto initialize opentelemetry
    Traceback (most recent call last):
    File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/sitecustomize.py", line 39, in initialize
    _load_instrumentors(distro)
    File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 91, in _load_instrumentors
    raise exc
    File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
    distro.load_instrumentor(entry_point, skip_dep_check=True)
    File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
    instrumentor: BaseInstrumentor = entry_point.load()
    File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2518, in load
    return self.resolve()
    File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2524, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
    File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/sqlite3/__init__.py", line 42, in <module>
    import sqlite3
    File "/usr/local/lib/python3.9/sqlite3/__init__.py", line 57, in <module>
    from sqlite3.dbapi2 import *
    File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
    from _sqlite3 import *
    ModuleNotFoundError: No module named '_sqlite3'
    ...
    Failed to export batch code: 404, reason: 404 page not found
    ...

Expected Result

The Python auto-instrumentation works well in my pod container.

Actual Result

The Python auto-instrumentation failed to be auto initialized for '_sqlite3' module not found error.

Kubernetes Version

v1.25.10+8c21020

Operator version

0.89.0

Collector version

0.89.0

Environment information

Environment

Platform: OpenShift 4.12.22 cluster Python3 version: Python 3.9.16

Log output

Instrumenting of sqlite3 failed
Traceback (most recent call last):
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
    distro.load_instrumentor(entry_point, skip_dep_check=True)
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
    instrumentor: BaseInstrumentor = entry_point.load()
  File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2518, in load
    return self.resolve()
  File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2524, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/sqlite3/__init__.py", line 42, in <module>
    import sqlite3
  File "/usr/local/lib/python3.9/sqlite3/__init__.py", line 57, in <module>
    from sqlite3.dbapi2 import *
  File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
    from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
Failed to auto initialize opentelemetry
Traceback (most recent call last):
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/sitecustomize.py", line 39, in initialize
    _load_instrumentors(distro)
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 91, in _load_instrumentors
    raise exc
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation/_load.py", line 87, in _load_instrumentors
    distro.load_instrumentor(entry_point, skip_dep_check=True)
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/distro.py", line 62, in load_instrumentor
    instrumentor: BaseInstrumentor = entry_point.load()
  File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2518, in load
    return self.resolve()
  File "/otel-auto-instrumentation-python/pkg_resources/__init__.py", line 2524, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/otel-auto-instrumentation-python/opentelemetry/instrumentation/sqlite3/__init__.py", line 42, in <module>
    import sqlite3
  File "/usr/local/lib/python3.9/sqlite3/__init__.py", line 57, in <module>
    from sqlite3.dbapi2 import *
  File "/usr/local/lib/python3.9/sqlite3/dbapi2.py", line 27, in <module>
    from _sqlite3 import *
ModuleNotFoundError: No module named '_sqlite3'
2023-12-01 07:27:34,096 INFO Included extra file "/etc/supervisor/conf.d/coreidp-login.conf" during parsing
2023-12-01 07:27:34,096 INFO Included extra file "/etc/supervisor/conf.d/filebeat.conf" during parsing
2023-12-01 07:27:34,100 INFO RPC interface 'supervisor' initialized
2023-12-01 07:27:34,100 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2023-12-01 07:27:34,100 INFO supervisord started with pid 1
2023-12-01 07:27:35,104 INFO spawned: 'coreidp-login' with pid 15
2023-12-01 07:27:35,107 INFO spawned: 'filebeat' with pid 18
{"log.level":"warn","@timestamp":"2023-12-01T07:27:35.546Z","log.origin":{"file.name":"beater/filebeat.go","file.line":175},"message":"Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning.","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-12-01T07:27:35.547Z","log.origin":{"file.name":"beater/filebeat.go","file.line":307},"message":"Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning.","service.name":"filebeat","ecs.version":"1.6.0"}
yarn run v1.22.19
warning Skipping preferred cache folder "/.cache/yarn" because it is not writable.
warning Selected the next writable cache folder in the list, will be "/tmp/.yarn-cache-1000910000".
$ cross-env NODE_ENV=production ROOT_PATH=$npm_package_config_root_path nodemon ./app.js -w server -w config
warning Cannot find a suitable global folder. Tried these: "/usr/local, /.yarn"
[nodemon] 2.0.22
[nodemon] to restart at any time, enter `rs`
[nodemon] watching path(s): server/**/* config
[nodemon] watching extensions: js,mjs,json
[nodemon] starting `node ./app.js`
Express server listening on port http://localhost:3003
Express server listening on secured port https://localhost:9443
2023-12-01 07:27:45,572 INFO success: filebeat entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)
Failed to export batch code: 404, reason: 404 page not found

2023-12-01 07:28:45,648 INFO success: coreidp-login entered RUNNING state, process has stayed up for > than 70 seconds (startsecs)
Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

(node:63) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.
(Use `node --trace-warnings ...` to show where the warning was created)
Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Failed to export batch code: 404, reason: 404 page not found

Additional context

No response

TylerHelmuth commented 9 months ago

These types of issues with python auto instrumentation and the operator are almost always an issue with your apps packages not being compatible with Pythons auto instrumentation.

Some things to try:

  1. Upgrade your python packages/python version.
  2. Instead of using the operator to do the injection add the python auto-instrumentation to the app yourself. This will confirm it is a python thing and not the operator.
xwgao commented 9 months ago

@TylerHelmuth Is there any way to ignore or bypass this error using the operator? Thanks.

xwgao commented 9 months ago

@TylerHelmuth I added the env var OTEL_PYTHON_DISABLED_INSTRUMENTATIONS (value: sqlite3) into my deployment, then after the pod restarted, the error was gone. But I still can not find any trace (span) collected for my Python service. Any idea about this? Thanks.

pavolloffay commented 9 months ago

I added the env var OTEL_PYTHON_DISABLED_INSTRUMENTATIONS (value: sqlite3) into my deployment,

Note that the env var can be as well added to the env field of the instrumentation CR.

I am not sure why your app is not producing any data.

xwgao commented 9 months ago

I resolved the sqlite3 error and opened another github issue https://github.com/open-telemetry/opentelemetry-python/issues/3573. Can any one help on this? Thanks a lot.

surabhi28 commented 4 months ago

I resolved the sqlite3 error and opened another github issue open-telemetry/opentelemetry-python#3573. Can any one help on this? Thanks a lot.

How did you resolve this issue? For us disabling instrumentation for sqlite3 only helped in removal of the error , but the log instrumentation still doesn't work, here is my instrumentation crd spec

spec:
  apacheHttpd:
    configPath: /usr/local/apache2/conf
    version: '2.4'
  dotnet:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:0.5.0
  env:
    - name: OTEL_EXPORTER_OTLP_TIMEOUT
      value: '200'
    - name: OTEL_LOGS_EXPORTER
      value: otlp_proto_http
    - name: OTEL_EXPORTER_OTLP_HTTP_LOGS_ENDPOINT
      value: >-
        http://obs-gateway-collector.test.svc.cluster.local:4318/v1/logs
  exporter:
    endpoint: >-
      http://obs-python-collector.test.svc.cluster.local:4318
  java:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.31.0
  nodejs:
    image: >-
      ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.34.0
  python:
    env:
      - name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
        value: 'true'
      - name: OTEL_PYTHON_LOG_LEVEL
        value: debug
    image: >-
          ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.44b0
  resource: {}
  sampler:
    type: always_on