open-telemetry / opentelemetry-js-contrib

OpenTelemetry instrumentation for JavaScript modules
https://opentelemetry.io
Apache License 2.0
701 stars 516 forks source link

Discussion(instrumentation-aws-sdk): SQS receive according to semantic conventions #707

Open blumamir opened 3 years ago

blumamir commented 3 years ago

This issue is to document the implementation of aws SQS receive operation according to the current semantic conventions for messaging systems (Oct 2021). there is an active SIG working on messaging systems specification which will probably change the specification and how to handle these situations when it's not possible to accurately extract perfect context.

receiveMessage

Processing Spans

According to OpenTelemetry specification (and to reasonable expectation for trace structure), user of this instrumentation library would expect to see one span for the operation of receiving messages batch from SQS, and then, for each message, a span with it's own sub-tree for the processing of this specific message.

For example, if a receiveMessages returned 2 messages:

This will result in a creating a DB span that would be the child of msg1 process span, and an HTTP span that would be the child of msg2 process span (in opposed to mixing all those operations under the single receive span, or start a new trace for each of them).

Unfortunately, this is not so easy to implement in JS:

  1. The SDK is calling a single callback for the messages batch, and it's not straightforward to understand when each individual message processing starts and ends (and set the context correctly for cascading spans).
  2. If async/await is used, context can be lost when returning data from async functions, for example:
async function asyncRecv() {
  const data = await sqs.receiveMessage(recvParams).promise();
  // context of receiveMessage is set here
  return data;
}

async function poll() {
  const result = await asyncRecv();
  // context is lost when asyncRecv returns. following spans are created with root context.
  await Promise.all(
    result.Messages.map((message) => this.processMessage(message))
  );
}

Current implementation partially solves this issue by patching the map \ forEach \ Filter functions on the Messages array of receiveMessage result. This handles issues like the one above, but will not handle situations where the processing is done in other patterns (multiple map\forEach calls, index access to the array, other array operations, etc). This is currently an open issue in the instrumentation.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] commented 2 years ago

This issue was closed because it has been stale for 14 days with no activity.

aadharsh-rengarajan commented 2 years ago

Hi @dyladan @blumamir does this issue still occur if we repeatedly poll using setTimeout?

I have an issue with the instrumentation where there are lot of nested spans. I am using forEach and inside it, using a setTimeout. Its causing the spans to be nested with one root span having 10000+ spans for receiving messages.

Quoting the above code, I wrote a similar code.

async function asyncRecv() {
  const data = await sqs.receiveMessage(recvParams).promise();
  return data;
}

async function poll() {
  const result = await asyncRecv();
  result.Messages.forEach((message, i) => {
    this.processMessage(message);
    setTimeout(() => {
      this._poll();
    }, this.pollInterval * i);
  });
}
trentm commented 9 months ago

This comment https://github.com/open-telemetry/opentelemetry-js-contrib/issues/1477#issuecomment-1836903586 has a section "Discussion: do we want to support this?" which argues for dropping some of the special handling for "SQS ReceiveMessage" requests -- specifically dropping the attempts to automatically create "processing" spans when iterating over received messages. IIUC, the semantic conventions have since changed to not longer mention "processing" spans.

seemk commented 8 months ago

I'm for dropping the process spans. In this case should the instrumentation just start a new receive span for every unique producer span context?