open-telemetry / opentelemetry-java

OpenTelemetry Java SDK
https://opentelemetry.io
Apache License 2.0
1.97k stars 817 forks source link

OpenJDK CRaC support #6756

Open filipetrovic opened 1 day ago

filipetrovic commented 1 day ago

Is your feature request related to a problem? Please describe.

I have a spring boot app which uses opentelemetry-javaagent. I wanted to utilize CRaC but javaagent is preventing CRaC from capturing the endpoint becuse it keeps files and connections open.

I have posted the issue on javaagent repository, and hoped for the resolution.

Then I started to work on a workaround, that is, removing javaagent completely and utilizing spring-boot-starter along with some intstrumentation libraries. I have used SDK-s autoconfiguration.

I had some progress, but in the end, CRaC still couldn't work with that setup, because the way Opentelemetry SDK is designed it doesn't allow reinitialization of OpenTelemetry bean, and more importantly the exporter which is used can only be shutdown, and not relinitialized again. I even tried reinitializing the exporter, but SdkTracerProvider has no method for replacing current exporter.

Then I abandoned autoconfiguration and I tried to go up the heirarchy of classes and replace every bean in Opentelemetry Configuration, where i eventually wanted to do this:

OpenTelemetrySdk newOpenTelemetrySdk = OpenTelemetrySdk.builder()
            .setTracerProvider(newTracerProvider)
            .buildAndRegisterGlobal();

but this failed since it's not allowed to call .buiildAndRegisterGlobal() more than once in app.

I tried all of this just to close the exporter connection to otel collector, before capturing CRaC checkpoint, and open the connection after the checkpoint is restored. More info about that here.

After all of this the only option was to manually do everything including exposing my own OpenTelemetry bean and reinitializing it once the checkpoint is restored, but I figured out that in order to do that, and to keep exporting traces i had to reinitialize all the beans which use Opentelemetry which I'm not keen on doing.

For example I'm using mongo instrumentation library which works like charm, but it works by injecting MongoTelemetry bean into MongoClientSettings and then creating MongoClient with those settings. And note that MongoTelemetry depends on OpenTelemetry bean, so if I wanted to change OpenTelemetry bean I had to reinitialize every mongodb connector. The point is, changing the whole OpenTelemetry bean along with exporters requires reloading all beans that depend on it which is not really convenient.

So the closest to a solution I have gotten is that I have a spring boot app which uses spring boot starter and mongodb instrumentation library with custom configuration where I could shutdown the exporter and capture CRaC checkpoint, but when the checkpoint is restored, the exporter was in 'shutdown' state and did not export any traces.

Describe the solution you'd like What I'd like is to have a support for CRaC in this SDK. It would be perfect if you could expose Opentelemetry as a CRaC resource and we register it in our app and checkpointing and restoring works fine.

I understand I might be asking too much, so the other approach which would be sufficient is to just allow the SpanExporters to have open() method so they can open the connection to collectors just like they have shutdown() method for shutting them down.

With that much control I could, hopefully, close the connection to exporter, and reopen it once the checkpoint is restored.

Additional context As you can imagine, here I am asking for a new feature, but I'm also looking for a solution to my problem. I don't know if this is the best place to post all of this, but I hope you can help me out. Thanks

trask commented 1 day ago

allow the SpanExporters to have open() method so they can open the connection to collectors

@jeanbisutti @brunobat if something like this was available, would it also benefit GraalVM by allowing the SDK to be initialized at build time instead of initialized at run time?

brunobat commented 17 hours ago

It would make things simpler but not mandatory. Quarkus does build time initialisation of some parts of the SDK. Our biggest problem is the chicken and egg problem. All instrumentation needs the SDK and some instrumentation is started very early, before the SDK is available. In these cases we have to provide a "proxy" to OpenTelemetry for lazy initialisation. In the context of CRaC, an important issue is what to do with the configurations that are runtime only. Restoring the exporter needs almost for sure a config reload, not just reconnection.

filipetrovic commented 16 hours ago

In the context of CRaC, an important issue is what to do with the configurations that are runtime only. Restoring the exporter needs almost for sure a config reload, not just reconnection.

Thats right, It will be useful to have a config reload also. Our use case for CRaC is to capture the checkpoint and use that checkpoint to spawn multiple instances of that app on different environments, which have different configuration.

So the one approach could be to provide setters for some basic config like service name, exporter endpoint etc. That way when we want to restore the checkpoint we could set the config parameters and 'modify' the Opentelemetry bean that's being used for instrumentation.

jeanbisutti commented 15 hours ago

would it also benefit GraalVM by allowing the SDK to be initialized at build time instead of initialized at run time?

It may help but not so much.

instrumentation needs the SDK and some instrumentation is started very early, before the SDK is available.

Same for the instrumentation of the OTel starter (logging).

So the one approach could be to provide setters for some basic config like service name, exporter endpoint etc. That way when we want to restore the checkpoint we could set the config parameters and 'modify' the Opentelemetry bean that's being used for instrumentation.

It may be significant work in the OTel Java SDK code?

other approach which would be sufficient is to just allow the SpanExporters to have open() method so they can open the connection to collectors just like they have shutdown() method for shutting them down.

It may be the simplest modification to do in the OTel Java SDK to start having something that works with CRaC.

brunobat commented 14 hours ago

other approach which would be sufficient is to just allow the SpanExporters to have open() method so they can open the connection to collectors just like they have shutdown() method for shutting them down.

It may be the simplest modification to do in the OTel Java SDK to start having something that works with CRaC.

I agree. Pretty much any class that needs a shutdown should also have an Open as well. I image the most common case are classes that do work in the background and need their own threads.