Open Vlaaaaaaad opened 3 years ago
Hi @Vlaaaaaaad,
Thanks for the very detailed proposal! More observability around the actions being taken by waypoint is an important goal. You call out doing it in opentelemetry format, is there a particular reason you feel it needs to be in that format? I ask because we're in the phase of gathering information about this as a larger feature.
Thanks!
Hey @evankerrigan!
I mentioned OpenTelemetry because it's the main open standard. I think the alternatives would be worse. Let's go through the options:
/metrics
endpoint appears. I don't think this will provide enough value. As a user I'd get "average time for builds" but no other visibility.waypoint
. This would improve the UI but would be pretty limiting. As a user, I couldn't get those traces in my favorite tool or export them to get any insights. How would I store and archive those traces? How will I see if the CI/CD deploy times have been steadily increasing by 1% every week for the last year? This option would add a lot of complexity, without any extra value.trace_id
in logs was discussed in the first postHaving Waypoint use OpenTelemetry would ensure people can consume the data however they want, in whatever tool they want, without adding extra load on the Waypoint team! I am sure some PMs will take the data to make a case for investing in the CI/CD pipeline. Using OpenTelemetry comes with the added advantage of using their SDKs which enables trace context and baggage propagation to the Waypoint Plugins too! Waypoint will be leading an ecosystem of observable CI/CD ❤️
@evanphx We can also separate general application telemetry from Waypoint internal telemetry. I think the latter is much easier and is definitely something we should try to support. We use OpenTelemetry in HCP (actually, I think the precursor cause it wasn't ready when we started) so would prob make sense to keep following that for our internal stuff...
As an extra data point, Jenkins also released an OpenTeletry plugin that looks great!
Is your feature request related to a problem? Please describe.
I'd love more visibility into CI/CD pipelines, and traces are the perfect way to visualize that!
Having a trace adds not only a different visualization, but enables insights into the CI/CD process. With
waypoint
exporting traces for each run, I could easily answer questions such as "did the build time increase for our application over the last 6 months?", "where could I best optimize my delivery pipeline?", or "do we have a higher failure rate for deploys when we also have to run the database migration step?".Describe the solution you'd like
Ideally, I'd love to have
waypoint
export an OpenTelemetry trace for eachwaypoint up
and all the underlying operations. The trace could be exported to a local cache, a file, or a remote endpoint (say DataDog or any other OpenTelemetry-compatible vendor).Each span would have relevant details like
provider_name
,provider_version
,values
,return_status
, and so on. Logs for each span would either be in-line or the field would have a link to the relevant section in the Waypoint UI.Describe alternatives you've considered
Using logs can be done, but is a worse user experience as pipelines have to be built to convert those in relevant metrics or traces. Having structured logs with a unique
job_id
is pretty close to having an actual trace with spans, but losing all the advantages (visualization, ingest, and reuse of data).Explain any additional use-cases
Adding tracing comes with some extra advantage of easier development and debugging of Waypoint itself! Both users and developers want to know if say the
deploy
step using Docker failed because the Docker API timed out. Both users and developers want to know if theaws
plugin failed in its 25th call to the AWS API due to a rate limit.Additional context
This feature request was already considered for Terraform, but the decision was made to wait until
opentelemetry-go
adds support for logs. Waypoint is different in 2 ways: it's early-stage and has its own UI. Being such an early project, without many external plugins and hooks, means the implementation impact is lower. Having a UI makes traces even more valuable: users could see them right in the UI!As inspiration, we can look at honeycombio/buildevents which does exactly that! It's a binary that can be used in CI/CD pipelines (with support for Travis, Circle, GitHub Actions, and more) to wrap commands. During the pipeline run, data is sent to Honeycomb, leading to a final trace looking like this:
Other traces examples can be seen on Twitter here and here.