kagkarlsson / db-scheduler

Persistent cluster-friendly scheduler for Java
Apache License 2.0
1.23k stars 188 forks source link

Micrometer tracing #451

Open adamalexandru4 opened 9 months ago

adamalexandru4 commented 9 months ago

Expected Behavior

Micrometer tracing available on each task execution.

Would be great if the ExecutePicked would be wrapped in a tracing context: One traceId, and different spans for complete or failure or even the same spanId.

Current Behavior

No tracing available

adamalexandru4 commented 8 months ago

@kagkarlsson If you agree with this idea, I can contribute the required changes.

kagkarlsson commented 8 months ago

I think that makes sense yes, as long as the code is acceptable. Not sure if we have all we need for it, I was thinking it could be implemented with what is now called StatsRegistry (might be promoted to some sort of Listener interface)

adamalexandru4 commented 4 months ago

The starting idea would be to play around with Tracer interface from micrometer-tracing inside the methods of ExecutePicked class. By default the Scheduler, would have a Tracer.NOOP implementation but in Spring Boot module we could try to detect the bean from ApplicationContext and inject it.

private void executePickedExecution(Execution execution, CurrentlyExecuting currentlyExecuting) { 
    .....
    Span taskExecutionSpan = tracer.nextSpan().name("task-execution")
      .tag("task-name", execution.taskInstance.getTaskName());
    try (Tracer.SpanInScope spanInScope = tracer.withSpan(taskExecutionSpan)) {
      ....
    } catch (..) {
      .....
   }  finally {
      taskExecutionSpan.end();
    }
 }

private void complete(CompletionHandler completion, Execution execution, Instant executionStarted) {
    ....
    Span taskCompleteSpan = tracer.nextSpan().name("task-complete")
      .tag("task-name", execution.taskInstance.getTaskName());
    try (Tracer.SpanInScope spanInScope = tracer.withSpan(taskCompleteSpan)){
        .... 
    } catch (..) {
      .....
   }  finally {
      taskCompleteSpan.end();
    }

private void failure(
      Task task,
      Execution execution,
      Throwable cause,
      Instant executionStarted,
      String errorMessagePrefix) {
    ....
    Span taskFailureSpan = tracer.nextSpan().name("task-failure")
      .tag("task-name", execution.taskInstance.getTaskName());
    try (Tracer.SpanInScope spanInScope = tracer.withSpan(taskCompleteSpan)){
        .... 
    } catch (..) {
      .....
   }  finally {
      taskFailureSpan.end();
    }
NicklasWallgren commented 3 months ago

We would also benefit from this, maybe OpenTelemetry could be a viable option instead of Micrometer Tracing?

I don't think it's possible to implement tracing using the StatsRegistry in its current form. It would require additional events in ExecutionStatsEvent (i.e. STARTED), and passing Execution to StatsRegistry#register so that we can add span attributes such as taskName and id.

kagkarlsson commented 3 months ago

A loose plan is to deprecate StatsRegistry and replace it with one or many Listener interfaces, also covering started and completed events. If anyone would like to take a stab at that (adding some form of adapter for StatsRegistry), feel free to try

NicklasWallgren commented 3 months ago

I've created a basic proof of concept that divides ExecutePicked into two versions: DefaultExecutePicked and ObservableExecutePicked. You can review the changes in the commit linked below: https://github.com/NicklasWallgren/db-scheduler/commit/5e2d23dd71a0c3043ab59a2da1b282579a05f1cf

However, a more efficient approach might be to transform StatsRegistry into a comprehensive listener interface, which would prevent multiple implementations of ExecutePicked

kagkarlsson commented 2 months ago

How does #504 work for this? Specifically:

  void onExecutionStart(CurrentlyExecuting currentlyExecuting);
  void onExecutionComplete(ExecutionComplete executionComplete);
kagkarlsson commented 2 months ago

@NicklasWallgren Any chance you could try reimplementing your PoC using the new SchedulerListener?

NicklasWallgren commented 2 months ago

@NicklasWallgren Any chance you could try reimplementing your PoC using the new SchedulerListener?

Great! It should possible to implement tracing with help of the new SchedulerListener.

I've implemented a temporary ExecutionHandler-wrapper, which bootstraps the trace/transaction before kicking of the delegated/actual execution handler.

public class ElasticApmExecutionHandler<T extends ObservableTaskContext> implements ExecutionHandler<T> {
    private static final String TRANSACTION_TYPE = "queue.task";

    private final ExecutionHandler<T> executionHandler;

    private ElasticApmExecutionHandler(final ExecutionHandler<T> executionHandler) {
        this.executionHandler = executionHandler;
    }

    @Override
    public CompletionHandler<T> execute(final TaskInstance<T> taskInstance, final ExecutionContext executionContext) {
        final Execution execution = executionContext.getExecution();

        final T data = taskInstance.getData();

        final Transaction transaction = ElasticApm
            .startTransactionWithRemoteParent((it) -> TraceIdHeaderValueFactory.create(data.traceId(), data.spanId()));

        try (final Scope scope = transaction.activate()) {
            transaction.setName("QueueTask#" + execution.getTaskName());
            transaction.setType(TRANSACTION_TYPE);

            transaction.setLabel("queue.task.id", execution.getId());
            transaction.setLabel("queue.task.name", execution.getTaskName());

            return executionHandler.execute(taskInstance, executionContext);
        } catch (final Exception e) {
            transaction.captureException(e);

            throw e;
        } finally {
            transaction.end();
        }
    }

    public static <T extends ObservableTaskContext> ElasticApmExecutionHandler<T> wrap(final ExecutionHandler<T> executionHandler) {
        return new ElasticApmExecutionHandler<>(executionHandler);
    }

    private static class TraceIdHeaderValueFactory {
        /**
         * Builds a trace parent which follows the W3C TraceParent standard.
         * <p>
         * Format: trace-id "-" span-id "-" trace-flags
         *
         * @return the trace parent
         */
        public static String create(final String traceId, final String spanId) {
            return String.format("00-%s-%s-01", traceId, spanId);
        }
    }
}
kagkarlsson commented 2 months ago

After some consideration I added an interceptor-concept as well. See PR #504