Open alexandros-megas opened 1 month ago
One small note regarding the logs, some lines might be out of order, since they are collected by our cluster and aggregated in our k8s opensearch operator for searching/etc.
but I don't see anywhere that it gets re-enabled at the end. Shouldn't it be re-enabling tracing so that other spans work correctly?
The suppressTracing
call only impacts the async task for the batched OTLP export request. It doesn't have a global impact on the whole process. This is very likely not the cause of the issue you are hitting.
items to be sent [ Span { attributes: { 'http.url': 'http://10.1.42.90:3002/explore',
That items to be sent ...
log line is from OTLPExporterBase at least attempting to send some tracing data on.
Can you confirm you are getting some spans sent to your backend?
And then if so, is the issue that you don't get any spans sent after that initial send?
I'm not sure what that Bad Request
log line is: whether it is from the OTel libs or your own code. It could be fine, I'm just not sure.
In one of my page's
getServerSideProps
function, I'm wrapping the data fetching logic in a span like the following:return getTracer()
On a hunch, could you console.log that tracer? E.g. use somethign like:
const tracer = getTracer()
console.log('XXX the tracer', tracer);
return tracer.startActiveSpan(...
That might help clarify if there is an actual tracer there, or a NoopTracer because of some surprise.
Next.js often bundles up the code being executed. OpenTelemetry support for bundled code ranges from extremely limited to totally broken. For example, the core node http
module will be instrumented, but not any other pacakges (like a database client or whatever). That might not matter to you.
Next.js has native OpenTelemetry support: https://nextjs.org/docs/app/building-your-application/optimizing/open-telemetry
Are you enabling any of that? Their registerOTel()
will register a global tracing provider that would potentially conflict with a separate one that you have installed.
Hi Trent, thanks for your response. To answer some of your questions/points:
@vercel/otel
because in the self hosting section of their documentation, they seem to recommend creating a custom configuration module anyway.getTracer()
utility, and post back my results.Thanks!
When I log the tracer, I see the following output. It does look like it's a real tracer that isn't a Noop tracer.
Acquired tracer: <ref *1> Tracer {
_tracerProvider: NodeTracerProvider {
_registeredSpanProcessors: [ [BatchSpanProcessor] ],
_tracers: Map(2) {
'@opentelemetry/instrumentation-http@0.52.1:' => [Tracer],
'clarifai-web@0.1.0:' => [Circular *1]
},
resource: Resource {
_attributes: [Object],
asyncAttributesPending: false,
_syncAttributes: [Object],
_asyncAttributesPromise: [Promise]
},
_config: {
sampler: [ParentBasedSampler],
forceFlushTimeoutMillis: 30000,
generalLimits: [Object],
spanLimits: [Object],
serviceName: 'clarifai-web',
autoDetectResources: true,
resource: [Resource],
traceExporter: [OTLPTraceExporter]
},
activeSpanProcessor: MultiSpanProcessor {
_spanProcessors: [Array]
}
},
_sampler: ParentBasedSampler {
_root: AlwaysOnSampler {},
_remoteParentSampled: AlwaysOnSampler {},
_remoteParentNotSampled: AlwaysOffSampler {},
_localParentSampled: AlwaysOnSampler {},
_localParentNotSampled: AlwaysOffSampler {}
},
_generalLimits: {
attributeValueLengthLimit: Infinity,
attributeCountLimit: 128
},
_spanLimits: {
attributeValueLengthLimit: Infinity,
attributeCountLimit: 128,
linkCountLimit: 128,
eventCountLimit: 128,
attributePerEventCountLimit: 128,
attributePerLinkCountLimit: 128
},
_idGenerator: RandomIdGenerator {
generateTraceId: [Function: generateId],
generateSpanId: [Function: generateId]
},
resource: Resource {
_attributes: {
'service.name': 'clarifai-web',
'telemetry.sdk.language': 'nodejs',
'telemetry.sdk.name': 'opentelemetry',
'telemetry.sdk.version': '1.25.1',
'service.version': '0.1.0',
'process.pid': 29,
'process.executable.name': '/usr/local/bin/node',
'process.executable.path': '/usr/local/bin/node',
'process.command_args': [Array],
'process.runtime.version': '20.16.0',
'process.runtime.name': 'nodejs',
'process.runtime.description': 'Node.js',
'process.command': '/app/node_modules/.bin/next',
'process.owner': 'root',
'host.name': 'clarifai-web-2952-6747bd448-fvtbv',
'host.arch': 'amd64'
},
asyncAttributesPending: false,
_syncAttributes: {
'service.name': 'clarifai-web',
'telemetry.sdk.language': 'nodejs',
'telemetry.sdk.name': 'opentelemetry',
'telemetry.sdk.version': '1.25.1',
'service.version': '0.1.0'
},
_asyncAttributesPromise: Promise { [Object] }
},
instrumentationLibrary: {
name: 'clarifai-web',
version: '0.1.0',
schemaUrl: undefined
}
}
I did my best to manually format it, because it was all printed on a single line.
Side note, I also tried swapping the SimpleSpanProcessor
for the BatchSpanProcessor
so I guess that's +1 on the BSP not being the root cause of the issue.
I just notcied this in your original log:
OTLPTraceExporter created with url http://otel-collector:14268/api/traces?format=jaeger.thrift
Is that right? The jaeger and thrift references seem wrong. Do you possibly have the OTEL_TRACE_EXPORTER_ENDPOINT
envvar pointing to a collector that expects something other than OTLP?
Port 14268 is a typical Jaeger port. My guess is you are sending OTLP/HTTP+JSON to Jaeger and it doesn't like it. That might explain the "Bad Request" log output -- though it is unfortunate that whatever code is logging that does not give some context.
What happened?
Steps to Reproduce
I'm instrumenting a next.js node app, and in it, I'm using the BatchSpanProcessor, since the docs seemed to recommend using it for performance reasons.
We're using node's require hook environment variable to ensure that this module is executed first before anything else in next.js runs. Our helm chart includes the following in the
env
section:In one of my page's
getServerSideProps
function, I'm wrapping the data fetching logic in a span like the following:Expected Result
I would expect the span processor to batch up some spans, and then send them to the exporter.
Actual Result
Seems like nothing is happening. No spans are sent to our collector endpoint.
Additional Details
I set the OTEL diag level to
ALL
so that I can see everything happening, and I noticed this line:After doing some investigation, I noticed that BatchSpanProcessor suppresses tracing (presumably) so that its own requests don't emit additional spans, but I don't see anywhere that it gets re-enabled at the end. Shouldn't it be re-enabling tracing so that other spans work correctly?
OpenTelemetry Setup Code
package.json
Relevant log output