open-telemetry / opentelemetry-dotnet-contrib

This repository contains set of components extending functionality of the OpenTelemetry .NET SDK. Instrumentation libraries, exporters, and other components can find their home here.
https://opentelemetry.io
Apache License 2.0
474 stars 282 forks source link

[question] How to instrument SQS lambdas properly? #2300

Open Dreamescaper opened 1 week ago

Dreamescaper commented 1 week ago

Component

OpenTelemetry.Instrumentation.AWSLambda

Question details

SQSEvent might contains multiple messages. How to handle them properly?

I can see that ExtractParentContext supports SQSEvent.SQSMessage input. So does it make sense to wrap each message processing, like this?

foreach (var message in sqsEvent.Records)
{
    await AWSLambdaWrapper.TraceAsync(tracerProvider, (message, context) => HandleAsync(...), message, lambdaContext);
}

Or maybe wrap both SQSEvent and SQSEvent.SQSMessage?

await AWSLambdaWrapper.TraceAsync(tracerProvider, async (sqsEvent, lambdaContext) =>
{
    foreach (var message in sqsEvent.Records)
    {
        await AWSLambdaWrapper.TraceAsync(tracerProvider, (message, context) => HandleAsync(...), message, lambdaContext);
    }
}, sqsEvent, lambdaContext);
github-actions[bot] commented 1 week ago

Tagging component owner(s).

@srprash @ppittle @muhammad-othman @rypdal @Oberon00

Oberon00 commented 1 week ago

In the docs for Dynatrace we recommend calling TraceAsync only with the actual event object, not the messages: https://docs.dynatrace.com/docs/shortlink/aws-lambda-otel-dotnet#sqs-sns-in

After all, there is only one Lambda invocation, so calling TraceAsync multiple times seems semantically wrong. I think what would be needed would be an additional API that creates the right spans to represent that we are now processing a message but not invoking the Lambda again. Related new issue seems to be #2244 (though the SDK defines a different message type than Lambda libraries). Otherwise I'd recommend following the OTel messaging semantic conventions yourself. https://github.com/open-telemetry/semantic-conventions/blob/main/docs/messaging/README.md

xactlance commented 1 week ago

@Oberon00 But isn't each record it's own message with it's own MessageAttributes and thus it's own traceparent and tracestate? I personally do the second approach:

foreach (var message in sqsEvent.Records)
{
     await AWSLambdaWrapper.TraceAsync(tracerProvider, (message, context) => HandleAsync(...), message, lambdaContext);
}

I would love to know if this is wrong.

Oberon00 commented 1 week ago

I'm not 100% sure but won't it get the faas.trigger, etc. activity tags? That way, it would mean it was a separate Lambda invocation which it wasn't. So the faas attributes would need to be skipped at least