Closed lyrst closed 4 months ago
/cc @sberyozkin (security)
@lyrst Thanks for the reproducer @mkouba @michalvavrik Are you aware of what might have changed that causes Websockets authenticator and configurator in the attached reproducer failing to provide the identity info with the token ? WebSockets next is an independent project AFAIK so should not have impacted the code which was running in 3.8.4
Thanks
I don't recall us providing any doc-ed recommendations along those lines, so may be we can recommend to switch to WebSockets Next to achieve the expected RBAC ?
Are you aware of what might have changed that causes Websockets authenticator and configurator in the attached reproducer failing to provide the identity info with the token ?
I have no idea.
WebSockets next is an independent project AFAIK so should not have impacted the code which was running in 3.8.4
Yes, I've created https://github.com/quarkusio/quarkus/issues/40312 to track our security-related efforts for WebSockets Next.
I tried to use the new WebSocketNext extension as I thought that the RolesAllowed
-annotation would work out of the box as it is based on the HttpServer (quote from documentation "The WebSocket handling reuses the main HTTP server.") but this does not seem to work ;-) But I guess I was lacking more information and as @mkouba created a new issue, it seems that more code changes are needed so make "WebSocketNext" work with the @RolesAllowed
I tried to use the new WebSocketNext extension as I thought that the
RolesAllowed
-annotation would work out of the box as it is based on the HttpServer (quote from documentation "The WebSocket handling reuses the main HTTP server.") but this does not seem to work ;) But I guess I was lacking more information ;) and as @mkouba created a new issue, it seems that more code changes are needed so make "WebSocketNext" work with the@RolesAllowed
Yes, the integration with the security extension is currently WIP.
Hi Martin, @mkouba, as far as this issue is concerned, can you please have a quick look at the #29919, as you helped there as well, and see if something rings a bell with respect to possible changes to the code referenced in #29919, between 3.8 and 3.9, np if nothing obvious springs to mind
NPE is thrown in Quarkus REST handler, I'll have a look.
I'm afraid this will take a CDI expert, at the very least to give me a hint. I'll write you an email.
Hm, so if I understand it correctly the failing test (ExampleResourceTest
) does not involve a WebSocket connection at all. It's merely testing the JAX-RS resource, right?
Hm, so if I understand it correctly the failing test (
ExampleResourceTest
) does not involve a WebSocket connection at all. It's merely testing the JAX-RS resource, right?
That's correct. The HTTP request comes into the WebSocket handler, but leaves it as it does not detect the WS header.
When Reactive Routes are used together with Quarkus REST and some @RouteFilter
is applied before the Quarkus REST begin processing, CDI request context comes active to the Quarkus REST. However Quarkus REST creates a new CDI request context state with io.quarkus.arc.ManagedContext#activate()
. I verified that when invoked on the same thread, the CDI request context state keeps changing. Here are some of my debug logging, not sure if useful for anyone else:
io.quarkus.vertx.runtime.VertxCurrentContextFactory is fallback
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (null) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
TestIdentityAssociation.setIdentity deferred - thread 124 context state io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
///// routing context /hello/foo1
|||||||||||||||| end rt ctx
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
cdi request scope activation in REST thread 124 context state io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c
TestIdentityAssociation.setIdentity sync - thread 124 context state io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
TestIdentityAssociation.getDeferredIdentity thread 124 context state io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7
io.quarkus.vertx.runtime.VertxCurrentContextFactory is duplicate
io.quarkus.vertx.runtime.VertxCurrentContextFactory old state (io.quarkus.arc.impl.RequestContext$RequestContextState@313c23a7) is not a next state (io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c)
io.quarkus.vertx.runtime.VertxCurrentContextFactory setting next state to io.quarkus.arc.impl.RequestContext$RequestContextState@1478249c
Interesting is also why the state has changed, I added printing of a stack trace, however it's too long to put here. See the attachment.
What fixes (or works around?) the issue is for example this: https://github.com/quarkusio/quarkus/commit/bf7880ab3 test-log.txt 65734ebe7aaa265dae724cd0010cd20
I found other ways to work around the issue, but it really requires a feedback from a CDI experts to determine the one that should be used. Use reproducer from the description. Thank you
cc @mkouba @manovotn @Ladicek @geoand
My simplistic understanding is that the context state changes when subscribed Uni
is created when the old context state is still in place, even though subscription is happening when the new context state is in place.
It seems like AbstractResteasyReactiveContext#requireCDIRequestScope()
completely omits the case where you already have active req. context that you didn't start up yourself and instead always tries to enforce new, blank instance through ThreadSetupAction#activateInitial
.
While I do not claim I understand the whole chain happening there (not even close in fact), this seems like an issue.
What fixes (or works around?) the issue is for example this: https://github.com/quarkusio/quarkus/commit/bf7880ab3
Yes, my debugging attempts landed at pretty much the same thing to resolve it - https://github.com/manovotn/quarkus/commit/7923d226a5b1aac0862e7c6055f450842b34f689
That being said, I am still unsure about:
RouteHandler
that does the activation and that seems perfectly valid?RouteHandler
prior to the logic implemented in AbstractResteasyReactiveContext
which then does no-op in this regardVery nice @manovotn , thank you for looking at it! I might not be a good adept to answer the questions raised, maybe someone else will, but I'll try.
Yes, my debugging attempts landed at pretty much the same thing to resolve it - https://github.com/manovotn/quarkus/commit/7923d226a5b1aac0862e7c6055f450842b34f689
lgtm, but tests will be needed. I can prepare them.
Is it expected that you can get a req, context that's already active? Seems like yes, because in this case, it's Vertx RouteHandler that does the activation and that seems perfectly valid?
I was asking myself a same thing, but considering that Reactive Routes are "finished product" (the way I see it; it's removed from RHBQ at least and recently dropped from SR JWT) I don't want to propose important changes there.
Given that it is permitted scenario, should we have some special handling for its deactivation given that someone else activated it and might want to deactive/destroy later?
My thinking was that we should not accept that users activate CDI request context. If user code or extensions activate CDI request context prior to Quarkus REST processing, they should destroy it as well. This is very early in processing and user code often needs Quarkus REST to prepare the request context for example with current identity and RoutingContext. We know how to handle and detect case when Reactive Routes activate it, but we can't really keep checking if someone destroyed active context during the processing inside Quarkus REST, can we? Wouldn't it require loop constantly checking as it can be async process?
So my thinking was that we allow it for Reactive Routes, but consider it "illegal state" when it comes in other cases. What should be done I'm not sure. Can we destroy the activate request context ourself and activate a new one? It is enough for cases where concrete state is kept, but not enough when someone simply calls requestContext.terminate()
. It should be simply illegal to activate request context without termination so early. WDYT @geoand ?
From my observations in this reproducer, the deactivation and destruction of context is handled by Vertx RouteHandler prior to the login implemented in AbstractResteasyReactiveContext which then does no-op in this regard
+1
lgtm, but tests will be needed. I can prepare them.
That would be awesome, thank you. I didn't mean to propose this as a final fix, just what I came up with while discovering what's going on :)
My thinking was that we should not accept that users activate CDI request context.
There are CDI APIs allowing you to programmatically activate req. context and even more of them in Arc itself. However, I don't think you'd be able to get that far without first activating it from within some other Quarkus part.
... but we can't really keep checking if someone destroyed active context during the processing inside Quarkus REST, can we? Wouldn't it require loop constantly checking as it can be async process?
Yea, if you go on tempering with context activation and destruction forcefully, you are IMO on your own. I don't think we should assume it's being manipulated with; doing so would break user app expectations anyway (i.e. state of existing user defined beans for example).
So my thinking was that we allow it for Reactive Routes, but consider it "illegal state" when it comes in other cases. What should be done I'm not sure. Can we destroy the activate request context ourself and activate a new one? It is enough for cases where concrete state is kept, but not enough when someone simply calls requestContext.terminate(). It should be simply illegal to activate request context without termination so early. WDYT @geoand ?
I'd say it is generally easier to respect if the context is already active when your framework/integration needs it and just reuse that one. Whoever creates the context is also responsible for its destruction. My original question was more along the lines of whether we can be sure that the context that we get here will outlast the window in which we need it to work.
Let me try and summarize so as to check if I have understood the issue correctly:
There is some piece of code that runs in a Vertx route that runs before Quarkus REST and that route activates the CDI context which then Quarkus REST completely ignores and starts a new one? Also, I have not understood from the discussion what has changed between 3.8 and 3.9.
(in short :-)) @manovotn I agree with your conclusions / answers. Thank you.
My original question was more along the lines of whether we can be sure that the context that we get here will outlast the window in which we need it to work.
So far I believe we can in case of Reactive Routes.
There is some piece of code that runs in a Vertx route that runs before Quarkus REST and that route activates the CDI context which then Quarkus REST completely ignores and starts a new one?
yes
Also, I have not understood from the discussion what has changed between 3.8 and 3.9.
I think it comes from refactoring of the EagerSecurityHandler
as if you move request context activation few lines above, it works. Problem is that we can't just accept that is the cause. There is a race. It would make for a fragile code that noone can touch. That's why I asked for help from CDI experts to find a root cause.
It's definitely possible to use the existing request scoped state if it exists, but is that really something we want to do in all cases?
It's definitely possible to use the existing request scoped state if it exists, but is that really something we want to do in all cases?
Can you think of a case where that would be an issue? I would put it the other way and rather say that we shouldn't forcibly destroy previous context and put up a new one. That can easily break expectations of other code (be it user or extension) that activated it.
Can you think of a case where that would be an issue?
Not off hand
What I think can become problematic is how we handle the end of the request processing in Quarkus REST.
Currently as we always start the context, we always terminate it as well. If however now we move to not starting the context, what is the expectation going to be about closing it?
What I think can become problematic is how we handle the end of the request processing in Quarkus REST.
Currently as we always start the context, we always terminate it as well. If however now we move to not starting the context, what is the expectation going to be about closing it?
That it is handled by whoever started it. Which is the case that I observed in this reproducer as well (see last bit of https://github.com/quarkusio/quarkus/issues/40307#issuecomment-2083249168)
https://github.com/quarkusio/quarkus/pull/40388 is a draft that reuses the request context which you folks can use to test.
+1, this is very similar to what I proposed earlier (https://github.com/manovotn/quarkus/commit/7923d226a5b1aac0862e7c6055f450842b34f689) and to what Michal has (https://github.com/quarkusio/quarkus/commit/bf7880ab3).
Although I'd prefer if the activateInitial
method actually does unconditional context activation but that's just minor comment :)
That it is handled by whoever started it
Theoretically yes, but is that actually the case in practice. If so, we also need to update the patch to not destroy the context
That it is handled by whoever started it
Theoretically yes, but is that actually the case in practice. If so, we also need to update the patch to not destroy the context
A repeated invocation of context destruction is a no-op (as is the case in the reproducer).
I see your point though; we could instead alter it so that if we detect an active context, we don't even attempt to destroy it.
We should be able to do that through the requestScopeActivated
boolean attribute with some additional checks in place.
I could take a look tomorrow as I am not in front of my laptop now (bank holiday here).
Right, that's my point.
I'll update my draft PR soon
I see your point though; we could instead alter it so that if we detect an active context, we don't even attempt to destroy it. We should be able to do that through the requestScopeActivated boolean attribute with some additional checks in place
I didn't really find a way to do that, because I don't see how I can attach anything to the RequestContextState
I see your point though; we could instead alter it so that if we detect an active context, we don't even attempt to destroy it. We should be able to do that through the requestScopeActivated boolean attribute with some additional checks in place
I didn't really find a way to do that, because I don't see how I can attach anything to the
RequestContextState
@geoand you cannot attach anything to it.
We should be able to use the property inside AbstractResteasyReactiveContext
to track whether the context originated here or elsewhere and perform cleanup conditionally.
Here's what I mean - https://github.com/quarkusio/quarkus/pull/40408
We can do that I guess
Describe the bug
I hava a WebSocket-endpoint that is secured with the "RolesAllowed"-annotation. In order to secure the websocket endpint I followed the approach that is described here https://github.com/quarkusio/quarkus/issues/29919
The issue that the Test
testFoo1Endpoint
does not even run with aNullPointerException
.As soon as I switch to Quarkus 3.8.4 the tests fails with an assertion-related error.
Expected behavior
The
testFoo1Endpoint
-tests is able to run without theNullPointerException
.Actual behavior
The
testFoo1Endpoint
-tests is fails with aNullPointerException
.How to Reproduce?
Run the following example code.
quarkus-sec-issue.zip
Output of
uname -a
orver
Darwin Kernel Version 23.4.0: Fri Mar 15 00:12:25 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6030 arm64
Output of
java -version
No response
Quarkus version or git rev
3.9.4
Build tool (ie. output of
mvnw --version
orgradlew --version
)mvn
Additional information
No response