Closed AkselAllas closed 1 year ago
This issue is pretty much https://github.com/open-telemetry/opentelemetry-js/issues/1739
I guess I have to figure out what spans are being attempted to export.
This is the span it's trying to export.
{
traceId: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
parentId: 'xxxxxxxxxxxxxxxxxxxx',
name: 'grpc.google.pubsub.v1.Publisher/Publish',
id: '72cc057f5cac5a4a',
kind: 2,
timestamp: 1676030779770028,
duration: 74799611,
attributes: {
'rpc.system': 'grpc',
'rpc.method': 'Publish',
'rpc.service': 'google.pubsub.v1.Publisher',
'rpc.grpc.status_code': '4',
'grpc.error_name': 'Error',
'grpc.error_message': '4 DEADLINE_EXCEEDED: Deadline exceeded'
},
status: { code: 2 },
events: [],
links: []
}
Related error and probably related issue
Exception from a finished function: Error: Total timeout of API google.pubsub.v1.Publisher exceeded 60000 milliseconds before any response was received.
So looks like it's pub-sub deadline exceeding, then GRPC instrumentation wanting to export it as a span 🤔 Working as intended? 🤔
Is there a way of suppressing these errors? 🤔
I try to have a GCP Cloudfunction which flushes telemetry at end of it's code.
Are you using Cloud Functions gen 1 or 2? Cloud Functions gen 2 is built on Cloud Run which supports graceful shutdown by capturing SIGTERM (example code). I'd recommend using a SIGTERM handler to call TracerProvider.shutdown()
and otherwise not flush telemetry at the end of your code.
try { functionCodeHere() } finally { await spanProcessor.forceFlush() }
If you take the approach of calling TracerProvider.shutdown()
once, all new traces will be dropped after that point (code). I believe that would fix your issue of spans being sent later than you intend?
We get spans, but also:
Service request { resourceSpans: [ { resource: [Object], scopeSpans: [Array], schemaUrl: undefined } ] }
and
{"stack":"Error: 4 DEADLINE_EXCEEDED: Deadline exceeded\n at callErrorFromStatus (/layers/google.nodejs.yarn/yarn_modules/node_modules/@grpc/grpc-js/src/call.ts:81:17) ...
I believe what's happening here is
So looks like it's pub-sub deadline exceeding, then GRPC instrumentation wanting to export it as a span 🤔 Working as intended? 🤔
Yes I think everything is WAI. To summarize, I'd recommend calling TracerProvider.shutdown()
on SIGTERM and also tuning your PubSub batching depending on what your goal is here. You may also be able to flush pubsub messages in SIGTERM handler or at the end of your Cloud Function.
Is there a way of suppressing these errors? 🤔
I'm not sure what you mean by suppressing. Are these just logs or runtime thrown exceptions? You can configure logs emitted from OpenTelemetry with the diag API https://opentelemetry.io/docs/instrumentation/js/getting-started/nodejs/#setup.
Changing this from bug to question since it doesn't seem like we're directly causing this with a bug but it might be a usage issue.
@AkselAllas are we OK to close this issue now?
Looks like. I will still try to confirm that I can remove it via pubsub config changes.
Happened on CF v1.
Stumbled on this issue again:
Root cause for these errors was both spanProcessor and metricExporter having 5sec or 15sec periodic export windows. Problem is when Cloud Function (Or also CPU detachable Cloud Run) detaches network and after that code still runs export function.
To fix this, we detect whether our code is running in possible detachable container and if it is, then we set export periods from 5/15 sec to 2147483,647 sec.
@dyladan @aabmass If possible, ideally this info could be written to all sorts documentation spots and examples.
What happened?
I try to have a GCP Cloudfunction which flushes telemetry at end of it's code. I think some GRPC call is still being tried to be sent, but GCP removes network and we get DEADLINE_EXCEEDED
Steps to Reproduce
Expected Result
We just get our spans
Actual Result
We get spans, but also:
and
Additional Details
OpenTelemetry Setup Code
Pretty much this https://github.com/AkselAllas/repro-otel-bug But also add
and instrument it
package.json
No response
Relevant log output