uber-go / cadence-client

Framework for authoring workflows and activities running on top of the Cadence orchestration engine.
https://cadenceworkflow.io
MIT License
344 stars 130 forks source link

Add support for OpenTelemetry via the Otel bridge #1166

Closed tfcace closed 11 months ago

tfcace commented 2 years ago

Related issue: https://github.com/uber-go/cadence-client/issues/1156

What changed? Support for Otel was added in the form of the opentracing Bridge.

This will allow a user of the SDK to set an Otel tracer + an Ot bridge. The bridge tracer will conform with the opentracing.Tracer interface with minimal impact on the Cadence SDK, while passing all tracing responsibility to the Otel tracer.

Probably the effort to replace the Ot interface with the one for Otel is more substantial, and it was just a matter of effort / capacity.

Why? Want to be able to propagate traces to multiple Otel backends, not just Jaeger.

How did you test it? Ran unit tests and integration tests. Also verified manually with Jaeger and Google Cloud tracing

Potential risks Nothing I can think of.

CLAassistant commented 2 years ago

CLA assistant check
All committers have signed the CLA.

tfcace commented 2 years ago

Hi @Groxx, any clue if the tests passed?

Groxx commented 2 years ago

Sorry about the delays :| Got busy and then moved across the country, and kinda only now catching up again.

I'll re-trigger the test runs, though I suspect tests will pass... which doesn't quite cover it for this, as there's a bunch of room for subtler changes like different data emitted in prod and possibly-different semantics :\

I'm attempting to find more specific docs for the terms (thanks for the links btw), and I think we'll try running this in our Jaeger stack internally to make sure it all works / still looks sane. An exact 1:1 isn't really necessary, it just has to be reasonable.

Groxx commented 2 years ago

I've got a fix for one piece of the test failures currently, in #1175 (in the meantime, just make copyright), and I'm not sure if it was working before or not, but it needs a go mod tidy. Then unit tests pass at least.

tfcace commented 2 years ago

Hi @Groxx, thanks. You're last comment was applied. Crossing fingers :)

Groxx commented 2 years ago

Tests pass, I kinda have to figure out a way to make sure the output data still works though :| I'm not entirely sure how to do that off-hand. We have some pre-production environments I could just deploy it to as a fallback, but I kinda think it'd be worth making a docker-compose setup in the server repo with jaeger 🤔 . It'd be useful for this, and for anyone making future tracing changes (I definitely want to do so!), and for users looking to do their own tracing stuff locally.

If you want to give building that a try, feel free, otherwise unfortunately I've got to focus on some other stuff in the short term (this week?). But I'll try to hack something up ASAP, and we can leave the "get a dev-friendly jaeger-including docker-compose PR we can merge to master" for some other day.

tfcace commented 2 years ago

Hi @Groxx, that's good news. Can't say I totally understand what is the desired outcome (assuming your internal outputs that you expect to see).

For what it's worth, there's a tracing sample I cooked up with @longquanzheng a while back in https://github.com/uber-common/cadence-samples/tree/master/cmd/samples/recipes/tracing

It should work with the Jaeger all-in-one docker.

I'll be AFK for a few days starting tomorrow. But I may be able to take a stab at modifying the docker compose. Can you add some details? Does it go in the main Cadence repo as new PR? Assuming that the compose also spins the Jaeger all-in-one, how does the Cadence team expect to make sure there's no regression?

Anyways, since I'm very interested in incorporating Otel into our Cadence stack, but don't want to rely on a fork, I want to do my best to help push this forward.

Cheers.

Groxx commented 2 years ago

Hi @Groxx, that's good news. Can't say I totally understand what is the desired outcome (assuming your internal outputs that you expect to see).

tbh me neither :) gonna just look at some traces before and after and see if it looks borked.

For what it's worth, there's a tracing sample I cooked up with @longquanzheng a while back in https://github.com/uber-common/cadence-samples/tree/master/cmd/samples/recipes/tracing

It should work with the Jaeger all-in-one docker.

Excellent, I'll give that a look, thanks!

I'll be AFK for a few days starting tomorrow. But I may be able to take a stab at modifying the docker compose. Can you add some details? Does it go in the main Cadence repo as new PR?

Eventually at least, yes. For verifying this I think it doesn't really matter, but a PR is at least an easy way to share code.

Assuming that the compose also spins the Jaeger all-in-one, how does the Cadence team expect to make sure there's no regression?

I'm planning on making it a non-default compose file, so preventing regression probably doesn't matter all that much. E.g. we can just call it an unstable dev setup and be done with it.

Actually committing to stability will probably require some E2E tests to prevent accidents, and I have no idea what that'd look like at the moment. Jaeger has APIs of course, so something is definitely possible.

tfcace commented 2 years ago

hi @Groxx, have you had a chance to play with this for abit?

tfcace commented 11 months ago

Using Otel can be achieved by using a Workflow Interceptor instead of modifying the SDK. Closing this PR.