Open alonwengierko opened 9 months ago
Is this a duplicate of #394?
Is this a duplicate of #394?
@cretz, it appears there's a distinction between the issues. The bug highlighted in this particular matter concerns the omission of certain spans when a workflow is executed solely by Schedule. Notably, spans such as StartActivity, StartWorkflow, RunWorkflow, StartChildWorkflows, among others, are not being captured, with only the RunActivity span being recorded. On the other hand, #394 suggests that the Schedule creation should indeed generate a tracing span.
I see. There is some overlap because the overarching issue is that the workflows need an outer span from the client side to have a parent (whether that's schedule create or workflow start). The question becomes, how do you do that? You don't have an interceptor to apply per client-workflow-start for a schedule, it's started server side, and one large outer span for a schedule would get huge and live forever.
Simply put, spans are applied SDK side, and there is no SDK side interception for starting workflows from a schedule. The server does not start spans for a user. Open to suggestions here.
I see. There is some overlap because the overarching issue is that the workflows need an outer span from the client side to have a parent (whether that's schedule create or workflow start). The question becomes, how do you do that? You don't have an interceptor to apply per client-workflow-start for a schedule, it's started server side, and one large outer span for a schedule would get huge and live forever.
Simply put, spans are applied SDK side, and there is no SDK side interception for starting workflows from a schedule. The server does not start spans for a user. Open to suggestions here.
@cretz I believe the outer span should represent the workflow start. I completely agree with your point that having the schedule creation as the outer span, with all workflows starting inside, would be not be the best. While I might not have all the intricate details, I firmly believe it should function similarly to when 'start_workflow' is invoked. In my perspective, when the schedule triggers a workflow execution, it should initiate a trace beginning from the workflow start. I'm uncertain why, currently, only 'RunActivity' spans are being generated.
The problem is that for schedules the server invokes start workflow and not your client. Therefore the server can't possibly know what OTel server you've setup with your SDK to send spans to (or that you're not using OpenTracing or DataDog or whatever other tracer). But we are considering options on possibly creating an outer span when the workflow is first run (knowing that it may have duplicates in rare initial-task-failed situations). We will leave this open while we figure out a solution here.
We fixed that problem using our implementation of TracingWorkflowInboundInterceptor. We generate a trace_id based on WorkflowRunId and set this context globally. Here is an implementation of our trace generator:
class IdempotentIdGenerator(IdGenerator):
"""TID generator for TracerProvider which repeatedly generates IDs based on idempotency_key."""
def __init__(self, idempotency_key: str) -> None:
self._rand = random.Random(idempotency_key) # noqa: S311
def generate_span_id(self) -> int:
span_id = self._rand.getrandbits(64)
while span_id == trace.INVALID_SPAN_ID:
span_id = self._rand.getrandbits(64)
return span_id
def generate_trace_id(self) -> int:
trace_id = self._rand.getrandbits(128)
while trace_id == trace.INVALID_TRACE_ID:
trace_id = self._rand.getrandbits(128)
return trace_id
What are you really trying to do?
I'm aiming to establish comprehensive tracing observability for temporal workflows initiated via scheduling.
Describe the bug
When the worfklow is triggered using create_schedule, only RunActivity spans are captured. However when it is triggered manually using start_workflow or execute_workflow, all spans are captured.
Tracing capture using jaeger for both manual and scheduled workflow executions. It shows workflow triggered manually with 5 spans, and the workflows triggered using schedule with 1 span.
The screenshot below show that the only span capture for the scheduled workflows is RunActivity
However for the workflows triggered manually using starter.py, we can see 5 spans captured. It is seen 5 spans: StartWorkflow, RunWorkflow, StartActivity, RunActivity and CompleteWorkflow
Minimal Reproduction
Environment/Versions
Additional context
scheduler.py.zip