quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.8k stars 2.68k forks source link

micrometer.binder.http-server shows wrong metrics with tags {outcome="CLIENT_ERROR",status="RESET"} in HTTP/2 #32557

Closed edu-gimenez closed 2 weeks ago

edu-gimenez commented 1 year ago

Describe the bug

The use of Http/2 with the default metrics of quarkus.micrometer.binder.http-server creates invalid metrics with the tags {outcome="CLIENT_ERROR", status="RESET"}, in addition to the correct metrics.

# TYPE http_server_requests_seconds summary
http_server_requests_seconds_count{method="GET",outcome="SUCCESS",status="200",uri="/hello",} 1.0
http_server_requests_seconds_sum{method="GET",outcome="SUCCESS",status="200",uri="/hello",} 0.004004918
http_server_requests_seconds_count{method="GET",outcome="CLIENT_ERROR",status="RESET",uri="/hello",} 1.0
http_server_requests_seconds_sum{method="GET",outcome="CLIENT_ERROR",status="RESET",uri="/hello",} 0.003597424
# HELP http_server_requests_seconds_max  
# TYPE http_server_requests_seconds_max gauge
http_server_requests_seconds_max{method="GET",outcome="SUCCESS",status="200",uri="/hello",} 0.004004918
http_server_requests_seconds_max{method="GET",outcome="CLIENT_ERROR",status="RESET",uri="/hello",} 0.003597424

Both metrics have been created through the same request.

I think the issue comes from io.vertx.core.http.impl.Http2ConnectionBase in method onStreamClosed:

void onStreamClosed(Http2Stream s) {
    VertxHttp2Stream stream = s.getProperty(streamKey);
    if (stream != null) {
      if (goAwayStatus != null) {
        stream.onClose(new HttpClosedException(goAwayStatus));
      } else {
        stream.onClose(HttpUtils.CLOSED_EXCEPTION);
      }
    }
    checkShutdown();
  }

When the stream (not the connection) is closed the above code is executed, and in the end the io.vertx.core.http.impl.Http2ServerStream#handleClose() method is called, which will execute metrics.requestReset(metric); creating the CLIENT_ERROR and RESET tags. Later, the response finishes and metrics.responseEnd(metric, request.response(), bytesWritten()); generates the correct metric.

I think that closing a stream should not be always considered an error

Expected behavior

Create correct metrics by avoiding false positives from RESET tags.

Actual behavior

No response

How to Reproduce?

I attach a simple reproducer with quarkus and micrometer enabled.

Reproducer: code-with-quarkus.zip

Steps to reproduce the behavior:

curl -vvv --http2-prior-knowledge --location --request GET 'http://localhost:8080/hello' \
--header 'x-request-id: f0c9d877-c4cb-4cde-88af-ce08a5a6a13b' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json'

or

curl -vvv --http2 --location --request GET 'http://localhost:8080/hello' \
--header 'x-request-id: f0c9d877-c4cb-4cde-88af-ce08a5a6a13b' \
--header 'Content-Type: application/json' \
--header 'Accept: application/json'

Output of uname -a or ver

No response

Output of java -version

java 17.0.5 2022-10-18 LTS Java(TM) SE Runtime Environment (build 17.0.5+9-LTS-191) Java HotSpot(TM) 64-Bit Server VM (build 17.0.5+9-LTS-191, mixed mode, sharing)

GraalVM version (if different from Java)

No response

Quarkus version or git rev

2.16.6.Final

Build tool (ie. output of mvnw --version or gradlew --version)

Apache Maven 3.8.6

Additional information

No response

quarkus-bot[bot] commented 1 year ago

/cc @brunobat (micrometer), @ebullient (metrics,micrometer), @jmartisk (metrics)

geoand commented 1 year ago

cc @vietj for the Vert.x part

chevaris commented 1 year ago

I am observing this behavior with HTTP2 and I am NOT sure that the problem is only metrics related, because I observe an unexpected latency degradation when using HTTP/2 on a micro that is mainly using POST. CPU usage is also 5-10% higher than using HTTP1.1. Maybe it is NOT related at all, BUT just in case I want to share it

For instance, when using hyperfoil with HTTP2 - plain text , or a micro running in K8s behind an Istio ingress I observe this behavior with RESET/CLIENT ERR metrics. As commented in this ticket, for each request we have a successful metric plus a RESET metric.

Checking with Wireshark, I see that when Quarkus recives a POST request with HTTP2, and Quarkus is sending back HTTP2 "END_STREAM" flag while sending back the answer with JSON data. In practice, Quarkus forces to use an stream per query (request/reponse). Is this on purpose? Probably YES. What is the overhead of this approach (if any)? NOTICE that stream state machine will be state half-closed unless client or server is sending the RESET_STREAM (That I do NOT observe). Is QUarkus expring those streams and closing based on time?

I have an equivalent microservice using HTTP2 with Spring Boot (Jetty based), and I do NOT observe degradation with HTTP/2. Performance with HTTP2 for the tested microservice Quarkus HTTP2 based is always worse that with HTTP 1.1 for the same traffic (e.g. latencies are 20% worse).

Quarkus logs for HTTP2 for a single requests:

2023-07-02 08:02:19,551 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-1) requestRouted null HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=null, templatePath=null, request=io.vertx.core.http.impl.Http2ServerRequest@13693e8c]
2023-07-02 08:02:19,556 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-1) requestRouted null HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=null, templatePath=null, request=io.vertx.core.http.impl.Http2ServerRequest@13693e8c]
2023-07-02 08:02:19,556 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-1) requestRouted null HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=null, templatePath=null, request=io.vertx.core.http.impl.Http2ServerRequest@13693e8c]
2023-07-02 08:02:19,556 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-1) requestRouted null HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=null, templatePath=null, request=io.vertx.core.http.impl.Http2ServerRequest@13693e8c]
2023-07-02 08:02:19,556 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-1) requestRouted / HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=null, templatePath=null, request=io.vertx.core.http.impl.Http2ServerRequest@13693e8c]
2023-07-02 08:02:19,559 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-1) requestReset HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=/, templatePath=null, request=io.vertx.core.http.impl.Http2ServerRequest@13693e8c]
2023-07-02 08:02:19,559 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-1) responseEnd io.vertx.core.http.impl.Http2ServerResponse@2823dfd9, HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=/, templatePath=null, request=io.vertx.core.http.impl.Http2ServerRequest@13693e8c]

When using HTTP 1.1, this is the sequence for a single request

2023-07-02 08:45:19,356 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-0) requestRouted null HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=null, templatePath=null, request=io.vertx.core.http.impl.Http1xServerRequest@5fa9667c]
2023-07-02 08:45:19,356 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-0) requestRouted null HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=null, templatePath=null, request=io.vertx.core.http.impl.Http1xServerRequest@5fa9667c]
2023-07-02 08:45:19,356 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-0) requestRouted null HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=null, templatePath=null, request=io.vertx.core.http.impl.Http1xServerRequest@5fa9667c]
2023-07-02 08:45:19,356 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-0) requestRouted null HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=null, templatePath=null, request=io.vertx.core.http.impl.Http1xServerRequest@5fa9667c]
2023-07-02 08:45:19,356 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-0) requestRouted / HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=null, templatePath=null, request=io.vertx.core.http.impl.Http1xServerRequest@5fa9667c]
2023-07-02 08:45:19,359 DEBUG [io.qua.mic.run.bin.ver.VertxHttpServerMetrics] (vert.x-eventloop-thread-0) responseEnd io.vertx.core.http.impl.Http1xServerResponse@46f703d2, HttpRequestMetric [initialPath=/ncudr-gud-dr/v1/directory-data/query, currentRoutePath=/, templatePath=null, request=io.vertx.core.http.impl.Http1xServerRequest@5fa9667c]
chevaris commented 1 year ago

I will open a differrent issue for performance in HTTP/2 if that is OK

geoand commented 1 year ago

Yes, please do

geoand commented 8 months ago

Is this still an issue?

pierregmn commented 8 months ago

I'm not the author of this issue but still I confirm that it remains an issue. As soon as we migrate to http2, false metrics are created with the RESET tag making it complicated to investigate in case of "real" RESETs. Regards

geoand commented 8 months ago

Thanks

brunobat commented 6 months ago

Will take a look at this.

brunobat commented 6 months ago

@vietj, @tsegismont I can confirm the issue on the posted code, for Quarkus 2.16.x. Current code has been refactored since, but the problem persists. See here: https://github.com/eclipse-vertx/vert.x/commit/7312e3e691aa51d43c2a5c196fd83b713e51235d

Main is still affected here: https://github.com/eclipse-vertx/vert.x/blob/0fdcbd37c2c3791ec73ee36de04296793d256b31/src/main/java/io/vertx/core/http/impl/Http2ServerStream.java#L216

This RESET seems to happen on all requests done with the curls above. I believe this is a Vert.x side issue. Should I create an issue on that side?

vietj commented 6 months ago

@brunobat do you have a reproducer for this case ?

brunobat commented 6 months ago

Yes, @vietj. The issue description has the procedure to reproduce... I was able to reproduce.

brunobat commented 5 months ago

@vietj @tsegismont Any news on this one?

vietj commented 5 months ago

@brunobat I will have a look

vietj commented 5 months ago

it seems that the issue is that

  1. vertx considers the stream is reset when it closed with an error an the response/request are not ended
  2. ending a vertx stream does close the stream
  3. the response ended boolean is set after the stream is closed

Given 1+2+3 we get this behaviour when this is executed from a non vertx thread (which is the case here) but we do not get this issue when used from a non vertx thread (worker). This is a bug in vertx metrics report when used from worker threads.

vietj commented 5 months ago

See https://github.com/eclipse-vertx/vert.x/issues/5222

cuong-p24 commented 4 months ago

Do we know if this issue occurs in Quarkus 3.x?

pierregmn commented 4 months ago

Yes the issue occurs in Quarkus 3.x. For instance, we are using 3.2.x and we do have the issue.

brunobat commented 1 month ago

@edu-gimenez @pierregmn This should be fixed on main. Can you please verity? Quarkus 3.15.1 includes Vert.x 4.5.10. The issue was fixed for Vert.x 4.5.9. See: https://github.com/eclipse-vertx/vert.x/issues/5222 Can we close this issue?

geoand commented 2 weeks ago

Let's close for now as per the comment above.

If the issue persists, please comment and we reopen the issue