kube-rs / controller-rs

A kubernetes reference controller
Apache License 2.0
270 stars 28 forks source link

automate tracing tests #46

Open clux opened 1 year ago

clux commented 1 year ago

Currently we have a single manual test for checking that our trace setup produces valid trace ids (TraceId != TraceId::INVALID). See #45

To test this we need a valid tracer that (afaikt) needs to talk to something.

The easiest way to currently test this is to setup a tempo instance using something like:

replicas: 1
tempo:
  retention: 24h
  authEnabled: false
  server:
    http_listen_port: 3100
  storage:
    trace:
      backend: local
      local:
        path: /var/tempo/traces
      wal:
        path: /var/tempo/wal
  receivers:
    otlp:
      protocols:
        http:
          endpoint: "0.0.0.0:55681"
        grpc:
          endpoint: "0.0.0.0:4317"
    jaeger:
      protocols:
        grpc:
          endpoint: "0.0.0.0:14250"
persistence:
  enabled: false

as values for grafana/tempo chart in (tested against 1.0.1 last).

then run just forward-tempo to redirect traffic to that when doing cargo test locally.

Once tempo is being forwarded to we can run just test-telemetry.

We should do one of:

Option 1: Find a way to get valid traceIds without requiring a valid collector

I couldn't figure out how. The mock NoopSamplers returns invalid traces: https://github.com/kube-rs/controller-rs/pull/45#issuecomment-1455623664

Option 2: Install tempo into the integration test cluster

via something like

      - uses: azure/setup-helm@v3
      - run: helm install grafana/tempo --wait -f values.yaml
      - run: just forward-tempo
      - run: just test-telemetry &

then run the integration tests there

Option 3: Run tests manually (current)

Run these tests manually when upgrading opentelemetry/tracing ecosystem. Not ideal, but these tests also doesn't give you the biggest signal, and there's an argument to be made for not having more complicated integration tests setups for a test controller.

Option 4: Find alternate tests

provided we have consistent versions of opentelemetry this should really never break. maybe we can replace this integration test with some cargo tree based duplicate version check instead. then on the other hand, it's nice to check that this all works, so probably not.