Closed yurishkuro closed 2 weeks ago
To prevent the infinite loop of trace generation caused by the tracer in plugin/storage/grpc/factory.go
in Jaeger v2, you need to ensure that the tracer used for gRPC communication does not generate traces for its own operations. This can be achieved by using a TracerProvider
that is configured to ignore certain operations or by using a SpanProcessor
that filters out these traces.
Here is an example of how you can modify the newRemoteStorage
function to achieve this:
func (f *Factory) newRemoteStorage(telset component.TelemetrySettings, newClient newClientFn) (*ClientPluginServices, error) {
c := f.configV2
// Create a new TracerProvider that excludes gRPC operations
tracerProvider := otel.GetTracerProvider()
// Add a resource to the TracerProvider to identify the gRPC client
resource := resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String("grpc-client"),
)
tracerProvider = sdktrace.NewTracerProvider(
sdktrace.WithResource(resource),
sdktrace.WithSampler(sdktrace.ParentBased(sdktrace.NeverSample())),
)
opts := []grpc.DialOption{
grpc.WithStatsHandler(otelgrpc.NewClientHandler(otelgrpc.WithTracerProvider(tracerProvider))),
}
if c.Auth != nil {
return nil, fmt.Errorf("authenticator is not supported")
}
tenancyMgr := tenancy.NewManager(&c.Tenancy)
if tenancyMgr.Enabled {
opts = append(opts, grpc.WithUnaryInterceptor(tenancy.NewClientUnaryInterceptor(tenancyMgr)))
opts = append(opts, grpc.WithStreamInterceptor(tenancy.NewClientStreamInterceptor(tenancyMgr)))
}
remoteConn, err := newClient(opts...)
if err != nil {
return nil, fmt.Errorf("error creating remote storage client: %w", err)
}
f.remoteConn = remoteConn
grpcClient := shared.NewGRPCClient(remoteConn)
return &ClientPluginServices{
PluginServices: shared.PluginServices{
Store: grpcClient,
ArchiveStore: grpcClient,
StreamingSpanWriter: grpcClient,
},
Capabilities: grpcClient,
}, nil
}
In this example, a new TracerProvider
is created with a NeverSample
sampler, which effectively disables tracing for the gRPC client operations. This prevents the infinite loop of trace generation [1].
To continue talking to Dosu, mention @dosu.
@yurishkuro is this blocked on the upstream issue? or is there anything we can do right now to fix this?
@mahadzaryab1 there is unfinished PR 5979. It's a bit hacky solution.
I would've preferred for this to be fixed upstream, but that ticket did not see much traction. We could just propose a new hook in OTEL to be usable from binaries like Jaeger, which would allow us to customer the tracer somehow, e.g. by using it with Jaeger Remote Sampler SDK extension which would allow to configure undesired endpoints with 0 probability.
Another option we could try to use is to inject a noop span into the context when the execution comes to places that we do not want to be traced - this will at least prevent downstream tracing into storage writes, but will not help with say healthcheck endpoint tracing (tbh that one could be fixed as a configuration option in the extension itself, to say "disable tracing" - assuming it's an issue today).
The factory is passed a tracer that it then passes to OTEL's ToClientConn() function, which automatically adds tracing instrumentation to gRPC connection. That means we may start generating new traces as we're trying to save spans to storage, thus causing an infinite loop of trace generation. The write path must always be devoid of tracing (or use very low sampling rate) to avoid this.
Related upstream issue https://github.com/open-telemetry/opentelemetry-collector/issues/10663