Closed tobert closed 2 years ago
Hey @ScottGarman , @splaspood would either of you be available, by chance, to check this out?
I'm :+1: on the overall gist of this, though I do wonder if it would be simpler to start with a specific component (like osie-runner) to instrument first, then roll it out more widely. Other than that, just request typical commit cleanup/squashes and a rebase to the main branch. I can then assist with the process of deploying an osie test build so we can run this in production at a limited scale.
@ScottGarman I have been thinking about splitting this PR and might just do that. Getting the otel-cli bits to work right will need more work to pass around OTEL_EXPORTER_OTLP_ENDPOINT
or some other configuration pointing at a server, so I think we can defer the otel-cli bits a while and see how things look after we light all this up.
I know y'all don't like to leave these hanging. Gonna close this and will open new PRs soon with the same code split up.
Description
Implements OpenTelemetry tracing in OSIE. This includes some code, otel-cli spans in shell scripts, and W3C traceparent propagation in http headers where we can.
Why is this needed
When everything is in place, we will get detailed traces of operations happening in OSIE, making it observable.
How are existing users impacted? What migration steps/scripts do we need?
Understanding what is happening in complex provisioning systems is difficult. We mostly get the job done with logs and metrics these days. This is an attempt to add OpenTelemetry events so that users of OSIE and other Tinkerbell components can emit traces to their favorite OpenTelemetry-compatible provider (or translate through opentelemetry-collector).
Checklist:
My TODO:
Related work:
tink
(WIP)otel-cli span background --tp-carrier
write out a file that can be sourcedotel-cli span background --wait
I have: