temporalio / sdk-go

Temporal Go SDK
https://docs.temporal.io/application-development?lang=go
MIT License
544 stars 215 forks source link

Add TaskQueue to TracerStartSpanOptions #1330

Open caramelomartins opened 11 months ago

caramelomartins commented 11 months ago

Hey! I'll stick to template.

Is your feature request related to a problem? Please describe.

OpenTelemetry allows specifying "a remote service" to which a connection is made through a peer service name. This can be helpful to determine interactions between services - namely workers - even if Temporal's architecture is asynchronous.

Currently, none of the available options in TracerStartSpanOptions allows us to infer if a worker is making a request that will be fulfilled by another worker. This makes it impossible to add peer service information in outbound calls (e.g. StartActivity or StartWorkflow).

Describe the solution you'd like

One potential approach could be to share the TaskQueue as a TracerStartSpanOption, similar to what is already done with the Name and Operation being executed. This would allow developers to externally infer if a cross-service request is being made based on whether the TaskQueue of the outbound request is different than the TaskQueue where a worker is executing now.

I'm sure this isn't an appropriate solution for all situations, but that could be left to the developer to decide - depending on the architecture of their services. Just providing the TaskQueue there could potentially open space for attempting to infer peer services.

Describe alternatives you've considered

I considered using the Name as the peer service but that value changes to often to be able to provide a meaningful signal for cross-service interactions. Nonetheless, I'm very open to other suggestions about how to go about doing this.

I'm happy to open a PR and work on this, but wouldn't want to do that before opening a feature request and discussing it.

Thank you!

cretz commented 11 months ago

It's not just task queue, there are a lot of things people may want as tags or customize the span.

I wonder if there is a way to provide the context to TracerStartSpanOptions (that is probably interface{ Value(any) any }) so that you can extract anything out of the context. For a workflow context this would be easy enough to get the info, but for a client context, you may be expected to provide something to the context before starting.

Another approach may be to have a special context key for storing tags on the context, and you can have an outer interceptor that sets this task queue that is then read by the span starter.

caramelomartins commented 11 months ago

@cretz I mentioned TaskQueue in attempt to not overload the feature request with a lot of things. Your suggestion is sensible though, I feel users will increasingly want to have more and more data - likely.

Adding more generic approaches, though, could easily complicate matters a bit, no? Adding the TaskQueue seems quite simple, if we do add context then there's already a divergence between context.Context and workflow.Context depending on the span that we are about to start.

I wonder if it makes sense to add the TaskQueue as an intermediate step, while discussing more generic approaches keeps happening?

cretz commented 11 months ago

@Quinn-With-Two-Ns do you have an opinion here? Maybe we want ActivityContext and WorkflowContext fields? Maybe can just provide the interface{ Value(any) any } and ask outer interceptors to set what they may want? Or maybe we should just encourage users that need advanced tracer implementations to write their own interceptors instead of using the existing ones? Or maybe task queue is ok for now?

Quinn-With-Two-Ns commented 11 months ago

Hm I don't think just attaching ActivityContext and WorkflowContext fields is sufficient since you may also want to have additional information on the client trace calls as well. I guess interface{ Value(any) any } cannot be auto set based on what is in the calling correct?

cretz commented 11 months ago

I guess interface{ Value(any) any } cannot be auto set based on what is in the calling correct?

It would be set by our primary tracing interceptor impl. To propagate info, you'd have to pass information through the context via an outer interceptor probably. Another approach maybe is a certain context key that contains span tags that is then extracted at span creation time. Of course this also requires an outer interceptor to take something like the task queue and put it in there.

So two questions: 1) Do we want to support generic span tag customization or just add task queue to span options? and 2) If we do want to support generic span tag customization, how do we do it reasonably?

Quinn-With-Two-Ns commented 11 months ago

I would support generic span tag customization over adding just task queue. I am not sure any approach will really fufill the ask of this issue

allows us to infer if a worker is making a request that will be fulfilled by another worker

Since task queue or any information added on the interceptor side is not enough to tell if a call will be fulfilled by the callers worker or a remote worker in general

cretz commented 11 months ago

I suppose it will be enough if they guarantee it's a different task queue, but for the same task queue you are correct there is no guarantee.

caramelomartins commented 10 months ago

I would support generic span tag customization over adding just task queue. I am not sure any approach will really fufill the ask of this issue

allows us to infer if a worker is making a request that will be fulfilled by another worker

Since task queue or any information added on the interceptor side is not enough to tell if a call will be fulfilled by the callers worker or a remote worker in general

I suppose it will be enough if they guarantee it's a different task queue, but for the same task queue you are correct there is no guarantee.

This is currently the architecture I'm working with, a single queue for each worker, which is why I feel having the queue in there would work out. It might not be a generic solution for all users though, as described, I understand that.