tinkerbell / osie

An in-memory installation environment for bare metal.
https://tinkerbell.org
Apache License 2.0
99 stars 30 forks source link

add opentelemetry #266

Closed tobert closed 2 years ago

tobert commented 3 years ago

Description

Implements OpenTelemetry tracing in OSIE. This includes some code, otel-cli spans in shell scripts, and W3C traceparent propagation in http headers where we can.

Why is this needed

When everything is in place, we will get detailed traces of operations happening in OSIE, making it observable.

How are existing users impacted? What migration steps/scripts do we need?

Understanding what is happening in complex provisioning systems is difficult. We mostly get the job done with logs and metrics these days. This is an attempt to add OpenTelemetry events so that users of OSIE and other Tinkerbell components can emit traces to their favorite OpenTelemetry-compatible provider (or translate through opentelemetry-collector).

Checklist:

My TODO:

Related work:

I have:

jacobweinstock commented 2 years ago

Hey @ScottGarman , @splaspood would either of you be available, by chance, to check this out?

ScottGarman commented 2 years ago

I'm :+1: on the overall gist of this, though I do wonder if it would be simpler to start with a specific component (like osie-runner) to instrument first, then roll it out more widely. Other than that, just request typical commit cleanup/squashes and a rebase to the main branch. I can then assist with the process of deploying an osie test build so we can run this in production at a limited scale.

tobert commented 2 years ago

@ScottGarman I have been thinking about splitting this PR and might just do that. Getting the otel-cli bits to work right will need more work to pass around OTEL_EXPORTER_OTLP_ENDPOINT or some other configuration pointing at a server, so I think we can defer the otel-cli bits a while and see how things look after we light all this up.

tobert commented 2 years ago

I know y'all don't like to leave these hanging. Gonna close this and will open new PRs soon with the same code split up.