Open fabrii opened 2 years ago
/cc @radcortez
Is this still an issue or has it been addressed by the various fixes that have occurred over time?
@geoand Google brought me to this issue because I got the same warning as above. I'm not sure if it's relevant or not.
I made a small reproducer to force the Quarkus instance to use one thread, and then I triggered a context switch for that thread to see if the traceID propagates correctly:
@Path("/hello")
class ExampleResource {
@GET
@Produces(MediaType.TEXT_PLAIN)
suspend fun hello(): String {
println("${LocalTime.now()} ${Thread.currentThread().name}: ${Span.current().spanContext.traceId}")
delay(3000)
println("${LocalTime.now()} ${Thread.currentThread().name}: ${Span.current().spanContext.traceId}")
return "hello"
}
}
This produces the following, after I called it twice in quick succession:
17:41:21.171259400 vert.x-eventloop-thread-0: 77f541d67a9a3bda6e1b96ec97ff805b
17:41:21.698196200 vert.x-eventloop-thread-0: 330def074a9c74332334f91283c94f1a
17:41:24.184476200 vert.x-eventloop-thread-0: 77f541d67a9a3bda6e1b96ec97ff805b
2022-10-27 17:41:24,234 WARN [io.qua.ope.run.QuarkusContextStorage] (vert.x-eventloop-thread-0) Context in storage not the expected context, Scope.close was not called correctly
17:41:24.715261 vert.x-eventloop-thread-0: 330def074a9c74332334f91283c94f1a
Fortunately, the context trace id is switched appropriately. I had some issues on my project because the old, "soon-to-be-deprecated" OpenTracing failed to do the context switch and would always keep the latest traceId.
I'm not sure if this helps with the issue at hand.
This last issue is related with the OTel context storage and restore. This is not specific to Baggage. As mentioned here: https://github.com/quarkusio/quarkus/discussions/23882#discussioncomment-2593231 It should be harmless.
Is there anything we need to do for this one or can it be closed?
So I'm seeing this during a ui regression test run with Quarkus 3.3.2., even after I managed to get rid of the OTLP export timeout issues (see #35686). I don't see any other exceptions or similar. Any notes on debugging this?
What were you expecting to see, @famod, this message? -> WARN [io.qua.ope.run.QuarkusContextStorage] (vert.x-eventloop-thread-7) Context in storage not the expected context, Scope.close was not called correctly.
@brunobat I am seeing this warning multiple times but I don't see any hints of exceptions. I'm referring to what @radcortez said in the discussion:
The OpenTelemetry instrumentation is provided by filters, but certain exceptions that break the execution flow may happen outside filters, which may cause the OpenTelemetry Scope not to be properly closed at the right time (hence the warning).
So I basically want to get rid of those warnings.
We should create a set of tests that replicate the scope closing issue and go from there... There also some other work needed, like propagate scope along with the span and make sure it is closed with the span.
In the meantime, would it make sense to add more info to that warning? Something to correlate which context wasn't closed correctly?
yes, @famod. Will do
Here you have it @famod ! https://github.com/quarkusio/quarkus/pull/35886
Thanks @brunobat!
I started looking at those now extended warning messages and so far, all logged "previous" traceids point to perfectly valid, non-erroneous REST calls. So for now I don't see what the problem is.
Do you have an idea of the percentage of spans that get the warning? Can be trace back to a particular code flow or are the random?
@brunobat I see these randomly in our tests using 3.5.3. I've just cleaned our build to filter all expected error and warning logs and these are left behind; unfortunately I can't say when they began.
They seem to be very random, with hundreds of @QuarkusTest
classes and many thousands of tests... I don't even see them from every complete build.
Here's an example:
Context in storage not the expected context, Scope.close was not called correctly. Details: OTel context before: {spanId=8695a6782afb78ff, traceId=d22e83c1ed0f9f64abd2baf22d449508, sampled=true}. OTel context toAttach: {spanId=9e70ae76f4e97f48, traceId=d22e83c1ed0f9f64abd2baf22d449508, sampled=true, parentId=8695a6782afb78ff}
I have the same issue, found out that I have thise errors when the client abort the connection after sending the request.
A reproducer I created for another issue reproduces this issue.
Thanks @computerlove, Will take a look
The fix for this issue will take while due to bandwidth. Will reduce the log from info to warn because it's causing issues in prod systems.
I did run into this while trying to reproduce #35686
This happens when multiple spans are open and they don't close in the right order. This requires an overhaul of the way we handle span scopes and will be handled while we introduce the Observation API, @geoand
Understood, thanks.
Possibly another useful data point, I consistently see this after agroal does some work. Logs show jumping from executor-thread-n to agroal-n and then back and it happens immediately, I assume when the non-blocking query returns.
Describe the bug
This issue originated on the following discussion: https://github.com/quarkusio/quarkus/discussions/23882 I am opening it to keep track here.
When using Baggage propagation, the Scope is not closed correctly.
In the log, we can see the following warning:
WARN [io.qua.ope.run.QuarkusContextStorage] (vert.x-eventloop-thread-7) Context in storage not the expected context, Scope.close was not called correctly.
Expected behavior
Not showing the warning. If it is harmless, as a workaround I think we can just ignore the log using quarkus.log.category.
Actual behavior
No response
How to Reproduce?
Reproducer: https://github.com/fabrii/quarkus-playground/tree/opentelemetry
Output of
uname -a
orver
No response
Output of
java -version
No response
GraalVM version (if different from Java)
No response
Quarkus version or git rev
2.7.1
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
No response