spandex-project / spandex

A platform agnostic tracing library
MIT License
335 stars 53 forks source link

Help with async process? #139

Closed pablo-meier closed 2 years ago

pablo-meier commented 2 years ago

Hi! Wrote it up over at ElixirForum, but maybe I could get more specialized πŸ‘€ here? Appreciate any help if you have any πŸ™‚


My company is using Datadog APM to profile traces for our Phoenix app, and the Spandex family has been great for sending traces πŸ˜„

The trouble is that we lose spans on our traces when we use Task.async and Task.await for multiple parallel DB calls. In code, it initially looked something like this:

def function_making_db_calls() do
  synchronous_function_calling_db()
  task1 = Task.async(fn -> my_function_calling_the_db() end)
  task2 = Task.async(fn -> other_function_calling_the_db() end)

  # other logic

  %MyStruct{
    data1: Task.await(task1),
    data2: Task.await(task2)
  }
end

Here, the first DB call (outside the Task.async/await) traces correctly, and we can see how long it takes in the context of the toplevel request. But the ones in the other processes don’t trace correctly.

Now! I saw this tip from the Spandex README:

Asynchronous Processes

The current trace_id and span_id can be retrieved and later used (for example, from another process) as follows:

 trace_id = Tracer.current_trace_id()
 span_id = Tracer.current_span_id()
 span_context = %SpanContext{trace_id: trace_id, parent_id: span_id}
 Tracer.continue_trace("new_trace", span_context)

New spans can then be logged from there and sent in a separate batch.

So I did something like this:

defp trace_aware_async(span_label, func) do
  trace_id = Tracer.current_trace_id()
  span_id = Tracer.current_span_id()
  span_context = %SpanContext{trace_id: trace_id, parent_id: span_id}
  Task.async(fn ->
    Tracer.continue_trace(span_label, span_context)
    func.()
  end)
end

def function_making_db_calls() do
  synchronous_function_calling_db()
  task1 = trace_aware_async("task1_span", fn -> my_function_calling_the_db() end)
  task2 = trace_aware_async("task2_span", fn -> other_function_calling_the_db() end)

  # other logic

  %MyStruct{
    data1: Task.await(task1),
    data2: Task.await(task2)
  }
end

But still no dice :weary:

I started digging into the sources and may continue to do this, and this person seems to have printed logs to find/correct discrepancies in the trace_id, but I’d rather not trouble my teammates with multiple PR reviews and monitor so many deploys. Do you have an idea what I’m doing wrong?

Thanks so much! :raised_hands:

san650 commented 2 years ago

We do something similar on our code base that's working. Here's some pseudo code. Hope that helps

defp trace_aware_async(span_label, func) do
  current_context = Tracer.current_context()

  Task.async(fn ->
    case current_context do
      {:ok, context} ->
        Tracer.continue_trace(span_label, context)

        try do
          func.()
        after
          Tracer.finish_trace()
        end
      {:error, _reason} ->
        func.()
    end
  end)
end
pablo-meier commented 2 years ago

I super appreciate this, the issue was forgetting to put Tracer.finish_trace(). Thanks so much for pasting your example πŸ˜„ @san650