quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.78k stars 2.68k forks source link

Bean with StartupEvent observer starts before extensions #32923

Closed jbonofre closed 1 year ago

jbonofre commented 1 year ago

Describe the bug

I have a bean using opentelemetry (injecting the Tracer) and with observes on StartupEvent:

@ApplicationScoped
public class Greeting {

    @Inject
    Tracer tracer;

    private final Logger logger = LoggerFactory.getLogger(Greeting.class);

    public void startup(@Observes StartupEvent startupEvent) {
        Span span = tracer.spanBuilder("My custom span")
                        .setAttribute("my.attr", "attr")
                                .setParent(Context.current().with(Span.current()))
                                        .setSpanKind(SpanKind.INTERNAL)
                                                .startSpan();
        span.addEvent("Starting ...");
        logger.info("Hello world !");
        span.addEvent("I'm done");
        span.end();
    }

    @WithSpan
    public void shutdown(@Observes ShutdownEvent shutdownEvent) {
        logger.info("Bye bye world !");
    }

}

When I start my application, I can see:

""09:34:37 WARN  traceId=, parentId=, spanId=, sampled= [io.qu.op.ru.ex.ot.LateBoundBatchSpanProcessor] (main) No BatchSpanProcessor delegate specified, no action taken.
""09:34:38 INFO  traceId=, parentId=, spanId=, sampled= [co.dr.cl.co.ex.fi.Greeting] (main) Hello world !
""09:34:38 INFO  traceId=, parentId=, spanId=, sampled= [io.quarkus] (main) first-service 1.0.0-SNAPSHOT on JVM (powered by Quarkus 3.0.1.Final) started in 0.803s. Listening on: http://0.0.0.0:8080
""09:34:38 INFO  traceId=, parentId=, spanId=, sampled= [io.quarkus] (main) Profile prod activated. 
""09:34:38 INFO  traceId=, parentId=, spanId=, sampled= [io.quarkus] (main) Installed features: [cdi, kubernetes-client, opentelemetry, rest-client, resteasy-reactive, smallrye-context-propagation, vertx]
...

So it seems that my bean is starting before the opentelemetry extension, even if I inject the Tracer. Is it a bug ?

quarkus-bot[bot] commented 1 year ago

/cc @Sgitario (kubernetes), @geoand (kubernetes), @iocanel (kubernetes)

geoand commented 1 year ago

cc @brunobat

geoand commented 1 year ago

This is not a bug - this is how @Observes StartupEvent works.

What exactly are you trying to do?

jbonofre commented 1 year ago

Basically I have two questions:

Thoughts ?

geoand commented 1 year ago

StartupEvent is triggered after all extensions are started, correct ?

That depends on what your definitetion of started is. This is why I want to know exactly what your end goal is

brunobat commented 1 year ago

OpenTelemetry is now created at runtime. Later than before. Looks like you are trying to get access to the OpenTelemetry object too early... Don't you get an error in the logs?

jbonofre commented 1 year ago

@brunobat yup that's basically my question: can I use @Observe on StartupEvent to guarantee my bean method is called after all extensions are loaded and startup (opentelemetry here in my example) ? Maybe we can define "stage" in the startup (like before or after extension) or startupOrder to guarantee that (like we do in Apache Karaf for instance).

Back to your question, the only log message I have just before my bean method is called is:

""09:18:09 WARN  traceId=, parentId=, spanId=, sampled= [io.qu.op.ru.ex.ot.LateBoundBatchSpanProcessor] (main) No BatchSpanProcessor delegate specified, no action taken.
""09:18:09 INFO  traceId=, parentId=, spanId=, sampled= [co.dr.cl.co.ex.fi.Greeting] (main) Hello world !

So, probably related (BatchSpanProcessor).

brunobat commented 1 year ago

Can you please setup a reproducer? I'll check what's going on.

jbonofre commented 1 year ago

Sure, let me prepare a simple test case and push on github. Thanks !

jbonofre commented 1 year ago

@brunobat sorry I was travelling last week. Let me create a test case for you.

geoand commented 1 year ago

@jbonofre any luck reproducing the issue?

jbonofre commented 1 year ago

I'm resuming the tests. Sorry again for the delay.

geoand commented 1 year ago

No worries, just checking :)

ghilainm commented 1 year ago

Any news on this? I get the same warning 'No BatchSpanProcessor delegate specified, no action taken.' but I don't have any error in logs.

jbonofre commented 1 year ago

I have a reproducer almost ready, I gonna push on a repo and share here.

jbonofre commented 1 year ago

I created a very simple test case here: https://github.com/jbonofre/quarkus-test

Classic build with mvn clean install.

Then, I run the app with java -jar target/quarkus-app/quarkus-run.jar, I see my bean started:

__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
17:05:18 WARN  traceId=, parentId=, spanId=, sampled= [io.qu.op.ru.ex.ot.LateBoundBatchSpanProcessor] (main) No BatchSpanProcessor delegate specified, no action taken.
17:05:18 INFO  traceId=, parentId=, spanId=, sampled= [ne.na.te.qu.op.Launcher] (main) Hello world !
17:05:18 INFO  traceId=, parentId=, spanId=, sampled= [io.quarkus] (main) quarkus-opentelemetry 1.0.0-SNAPSHOT on JVM (powered by Quarkus 3.1.0.Final) started in 0.501s. 
17:05:18 INFO  traceId=, parentId=, spanId=, sampled= [io.quarkus] (main) Profile prod activated. 
17:05:18 INFO  traceId=, parentId=, spanId=, sampled= [io.quarkus] (main) Installed features: [cdi, opentelemetry, smallrye-context-propagation, vertx]

My point is that I don't see the span from the start() method (AFAIR we have the console exporter by default).

Maybe a race condition between when the opentelemetry feature is started and the startup event triggered.

Do I do something wrong ?

gsmet commented 1 year ago

@mkouba maybe you could have a look at this one?

mkouba commented 1 year ago

So the LateBoundBatchSpanProcessor logs a warning because the LateBoundBatchSpanProcessor.delegate field is not set when the StartupEvent is fired. This field is set in the OtlpRecorder#installBatchSpanProcessorForOtlp() method which is called by the OtlpExporterProcessor#installBatchSpanProcessorForOtlp() build step. And this build step does not produce anything and the consumed BeanContainerBuildItem only ensures that the CDI container is started (which is redundant because the container is started durin STATIC_INIT).

If we need to delay the StartupEvent until the BatchSpanProcessor is initialized then we have to replace the BeanContainerBuildItem with BuildProducer<ServiceStartBuildItem>. We should also make the LateBoundBatchSpanProcessor.delegate field volatile.

@brunobat I can send a PR if you believe that it's ok from the OTel point of view...

mkouba commented 1 year ago

@brunobat This patch https://github.com/mkouba/quarkus/commit/a1ec517630ed7a94cd03c56b3bed496b6b3bef8f removes the warning but I can still see 10:28:05 INFO traceId=, parentId=, spanId=, sampled= [ne.na.te.qu.op.Launcher] (Quarkus Main Thread) Hello world ! in the console, i.e. the spandId etc. are empty :shrug:

jbonofre commented 1 year ago

@mkouba yeah, agree. IMHO it would be more "logical" to delay StartupEvent using BuildProducer<ServiceStartBuildItem>. We also have to double check that the tracing details are populated (traceId, spanId, ...). I guess that just delaying will work for the processor but not for the actual tracing populate.

I can also work on a PR about that.

Please let me know what you prefer 😄

mkouba commented 1 year ago

I can also work on a PR about that.

Sure. Feel free to take my commit linked in https://github.com/quarkusio/quarkus/issues/32923#issuecomment-1580202385.

brunobat commented 1 year ago

@mkouba Thanks! I agree with what you say and the proposed commit. Sorry for the delay, though.
@jbonofre if you could work on the PR, that would be great.

geoand commented 1 year ago

@jbonofre just checking, are you still interested in doing the PR? Asking because this is something we likely want in 3.2 :)

jbonofre commented 1 year ago

@geoand yes, I was busy last week with other stuff and I didn't have time to move forward on my Quarkus PRs (including this one). I booked time this week :).

Sorry for the delay.

geoand commented 1 year ago

Cool, thanks a lot!

geoand commented 1 year ago

@jbonofre can you perhaps test if this still happens in 3.3?

Asking because we made various improvements to the OpenTelemetry integration that could potentially have fixed this.

Thanks

geoand commented 1 year ago

I am going to close this for the lack of feedback. If this is still a problem, feel free to comment and we can reopen.