Open goldmann opened 5 months ago
/cc @brunobat (opentelemetry,tracing), @pedroigor (oidc), @radcortez (opentelemetry,tracing), @sberyozkin (oidc)
CC @michalvavrik
Do the traces get sent correctly when OIDC is not part of the picture?
Also, can you try Quarkus 3.8 and 3.9 please?
Please mind this is just a startup issue as discussed here: https://github.com/quarkusio/quarkus/issues/37718#issuecomment-2014570612
OIDC seems to be connecting out before Quarkus startup has finished.
Hi @brunobat, Is it an OIDC specific issue, or, can for example, OpenTel capture events related to the Kafka client or Hibernate establishing its connections ? OIDC initialization has to start when it is given an initialized Vert.x instance, from the OIDC perspective the Quarkus is ready at this point
I don't think OIDC can be specifically aware of OTel, it has to work with or without OTel, so the question is, how OIDC or indeed other extensions creating some connections can know all is ready, with and without OTel. Should OTel queue the events until it is ready ?
As discussed earlier in the other PR, the OTel extension needs vert.x as well, and can only be instantiated after vert.x is up. All communications happening until OTel is ready will not generate any spans and in the case of vert.x, there is a log notifying these "lost" spans. The idea of caching unsent spans until OTel is ready was floated, however creating spans without the SDK is not likely and artificially feeding them later will mess up their timestamps anyway, which is arguably even worse. I imagine there is a lot of stuff stating after vert.x is up, not just OTel, I wonder if it's wise to issue requests before the end of the boot process. Probably even before the health check is up...
I guess this is a broader question, not just for OIDC.
So IIUC, the problem here is that OIDC is making a request once Vertx is setup but before OTel is?
If so, I think this situation is perfectly legal and we should just make sure that it does not result in bogus warnings
I didn't plan to comment because without reproducer I'm not sure I can debug this but last comment surprised me.
So IIUC, the problem here is that OIDC is making a request once Vertx is setup but before OTel is?
If so, I think this situation is perfectly legal and we should just make sure that it does not result in bogus warnings
If so, why can't we just enforce order between OIDC and OTel like we did between OTel and Agroal with io.quarkus.agroal.spi.OpenTelemetryInitBuildItem
? Or maybe that's what you meant, sorry, I am not sure.
I think that can work, yes
If we really want OIDC to start after OTel (I can't comment on whether that's a valid goal), then of course that is doable (I assume fairly easily)
If we really want OIDC to start after OTel (I can't comment on whether that's a valid goal), then of course that is doable (I assume fairly easily)
Ok, thanks for the explanation. I don't see why not, what is disadvantage as long as both happens before application starts to receive requests (e.g. before io.quarkus.deployment.builditem.ApplicationStartBuildItem
), but maybe Sergey will know. I can't comment on that either.
I'm not sure of the side-effects. Does it make sense to all of Quarkus extensions which may want to report something at the startup ?
Yes, it makes sense to delay their instantiation, because otherwise we will not be able to track those requests reliably.
I also am not sure which calls should be traced... What I definitely don't think makes sense is to have all outgoing connections be made only after OTel starts. To me it makes sense to decide on a case per case basis.
@geoand Right, there is no any specific use case related to the OIDC start up, compared to any other extension which is attempting to set up a remote connection with the remote server being temporarily unavailable.
What Michal did, was to report the connection failure event if the OIDC server was not ready, and then report that it was re-established at runtime during the first request. This is certainly a useful kind of info to report. But it also a general type of report that can be of interest not only for OIDC (for HashiCorp, for any extension which needs to set up a connection for its work).
May be we can introduce a build time property, which if enabled, would require OIDC, or any other extension which would like to do a similar kind of report, to wait until OTel is available as opposed to introducing extension specific build items one by one and do it non-optionally ?
I think the configuration is not necessary. If an error can happen on that call and it's relevant, it seems to me that it must be traced. If there is no impediment to delay the startup of OIDC until after OTel, if OTel is present, this should be done.
@brunobat Hey, I agree it should be done, I'm wondering though if it can be treated in a more generic way, as opposed to per-extension specific delaying of the startup. As I said, a connection failure at the startup can happen not only in the case of OIDC. I'm also not 100% sure, delaying the OIDC startup so that it can report events which can never happen (connection failure, as opposed to for ex the authentication success/failure) should be done by default
@brunobat Hey, I agree it should be done, I'm wondering though if it can be treated in a more generic way, as opposed to per-extension specific delaying of the startup.
FWIW (please correct me where I am wrong): If OpenTelemetry extension is not ready to attach data / to receive the request data, it can't just keep this data till it becomes ready because the data are not finite. They need to be accessible / adjustable, we can't just reply steps when the OTel becomes ready because what happens depends on a state stored in the context.
As I said, a connection failure at the startup can happen not only in the case of OIDC. I'm also not 100% sure, delaying the OIDC startup so that it can report events which can never happen (connection failure, as opposed to for ex the authentication success/failure) should be done by default
OpenTelemtry as well as OIDC will always be ready for authentication success/failure because application only receives incoming requests when all the extensions are ready. We are only discussing outgoing requests during application startup and that can either be ordered (so that we can put OTel before OIDC) or not. Personally I don't see issue with starting OTel before OIDC if there is demand.
What I think should be decided is whether these data between auth server and Quarkus should be tracked, because to my knowledge it is only tracked because @sberyozkin decided to use Vert.x web client (+1), but there was never actual decision to provide traces for this (that happens through Vert.x).
I can't answer the last paragraph.
I didn't test it, but I think I have moved related code to a separate build steps in the https://github.com/quarkusio/quarkus/pull/41571. If we really want to fix this, all that needs to be done is to introduce build-time SPI and add @Consume
on these buildsteps.
Neither OIDC or OTel has build-time SPI. There doesn't seem to be agreement if it is OIDC specific or whether there should be OTel SPI.
Describe the bug
When the oidc extension is used together with otel extension I see warning in the logs which suggest that the traces from oidc are not published to the collector.
I've checked the logs in debug mode and just before these warning a connection is established to the auth server.
Potentially this is an OIDC extension issue, which should be setup just a little bit later, after the OTEL one can process traces.
This is related to https://github.com/quarkusio/quarkus/issues/37718
Expected behavior
No response
Actual behavior
No response
How to Reproduce?
No response
Output of
uname -a
orver
No response
Output of
java -version
No response
Quarkus version or git rev
3.7.2
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
No response