mitodl / ol-infrastructure

Infrastructure automation code for use by MIT Open Learning
BSD 3-Clause "New" or "Revised" License
43 stars 4 forks source link

Productionalize OTEL for edx installations #1948

Open Ardiea opened 7 months ago

Ardiea commented 7 months ago

Description/Context

As a devops engineer I would like to provide tracing data to engineers to show how users interact with our application among other things.

Plan/Design

  1. Create a build and publish pipeline for the new plugin that Shahbaz has created: https://github.com/mitodl/open-edx-plugins/pull/213
  2. Update edxapp deployment config to install this plugin + its pre-reqs:
    opentelemetry-api
    opentelemetry-sdk
    opentelemetry-instrumentation-django
    opentelemetry-exporter-richconsole
    opentelemetry-exporter-otlp-proto-http
  3. Include the new env var IN THE .env FILE!
    ENV PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION='python'
  4. Update configurations for LMS and CMS. Something similar to this:
    OTEL_CONFIGS:
    OTEL_ENABLED: True
    OTEL_TRACES_ENABLED: True
    OTEL_METRICS_ENABLED: True
    METRICS_EXPORTER: otlphttp
    TRACES_EXPORTER: otlphttp
    OTEL_EXPORTER_OTLP_METRICS_PROTOCOL: "http/protobuf"
    OTEL_EXPORTER_OTLP_METRICS_ENDPOINT: "https://otlp-gateway-prod-us-central-0.grafana.net/otlp/v1/metrics"
    OTEL_EXPORTER_OTLP_METRICS_HEADERS: {"Authorization": "Basic supersecretb64data"}
    OTEL_EXPORTER_OTLP_TRACES_PROTOCOL: "http/protobuf"
    OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: "https://otlp-gateway-prod-us-central-0.grafana.net/otlp/v1/traces"
    OTEL_EXPORTER_OTLP_TRACES_HEADERS: {"Authorization": "Basic supersecretb64data"}
    OTEL_TRACES_RESOURCE_ATTRIBUTE: {
    'service.name': 'mitx-staging-ci',
    }
    OTEL_PYTHON_DJANGO_EXCLUDED_URLS: "healthcheck"
    OTEL_PYTHON_DJANGO_TRACED_REQUEST_ATTRS: "path_info,content_type"
    OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_REQUEST: ".*"
    OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SERVER_RESPONSE: ".*"
    OTEL_INSTRUMENTATION_HTTP_CAPTURE_HEADERS_SANITIZE_FIELDS: ".*session.*,set-cookie"
  5. Maybe need to add this to the earth file after all other pip steps but need to confirm.
    RUN pip uninstall -y protobuf
    RUN pip install --no-warn-script-location --user --no-cache-dir --no-binary protobuf protobuf

Appendix: To get the authorization token it is ":" Needs permissions to write metrics/logs/traces.

Documentation: https://grafana.com/docs/grafana-cloud/send-data/otlp/send-data-otlp/ Notes from POC and validation: https://github.com/mitodl/open-edx-plugins/pull/213#issuecomment-1817102010

Ardiea commented 6 months ago

There is some kind of issue with timestamps coming out of the plugin (possibly) that is causing CMS to crash after some period of time. Need to figure out what that is about (or disable OTEL for CMS... less desirable).