quarkusio / quarkus

Quarkus: Supersonic Subatomic Java.
https://quarkus.io
Apache License 2.0
13.56k stars 2.62k forks source link

gRPC Server INTERNAL: HTTP status code 400 When Upgrading from 3.12.3 > 3.13.0 #42724

Closed brianwyka closed 1 week ago

brianwyka commented 3 weeks ago

Describe the bug

When upgrading from version 3.12.3 to 3.13.0, with no other code changes, the gRPC server is returning an INTERNAL error when being called from a gRPC blocking stub in another application.

Expected behavior

Should still work as expected and should not return an error.

Actual behavior

Returns an error response INTERNAL with an HTTP status code of 400 and a message indicating invalid content type which is null.

INTERNAL: HTTP status code 400
invalid content-type: null
trailers: Metadata(:status=400,date=Fri, 23 Aug 2024 00:10:53 GMT,strict-transport-security=max-age=31536000; includeSubDomains,content-length=0)
io.grpc.StatusRuntimeException: INTERNAL: HTTP status code 400
invalid content-type: null
trailers: Metadata(:status=400,date=Fri, 23 Aug 2024 00:10:53 GMT,strict-transport-security=max-age=31536000; includeSubDomains,content-length=0)
    at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:268)
    at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:249)
    at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:167)

How to Reproduce?

Client Side relevant dependency versions:

com.google.protobuf:protobuf-java:3.25.2
com.google.protobuf:protobuf-java-util:3.25.2
io.grpc:grpc-stub:1.61.1
o.grpc:grpc-protobuf:1.61.1

Here is the client side code for setting up the ManagedChannel:

            final Metadata metadata = new Metadata();
            metadata.put(Key.of("Authorization", Metadata.ASCII_STRING_MARSHALLER), apiAuthorizationToken);
            this.managedChannel = ManagedChannelBuilder.forTarget(apiEndpoint)
                .intercept(MetadataUtils.newAttachHeadersInterceptor(metadata))
                .keepAliveTime(2, TimeUnit.HOURS)
                .keepAliveTimeout(20, TimeUnit.MINUTES)
                .keepAliveWithoutCalls(false)
                .idleTimeout(30, TimeUnit.MINUTES)  // Set idle timeout to 30 minutes
                .enableRetry()
                .build();

The apiEndpoint is on secure port, i.e. - service.domain.com:443

Quarkus Server side configurations:

quarkus.grpc.server.compression = gzip
quarkus.grpc.server.use-separate-server  = false

Using proto3...

Quarkus Server Side service:

@GrpcService
public class SummaryResource extends SummaryServiceGrpc.SummaryServiceImplBase {
 ...
}

Output of uname -a or ver

linux (debian 12.5) - running in AWS EKS

Output of java -version

17.0.11.9.1

Quarkus version or git rev

3.13.0

Build tool (ie. output of mvnw --version or gradlew --version)

gradle 8.10

Additional information

No response

quarkus-bot[bot] commented 3 weeks ago

/cc @alesj (grpc), @cescoffier (grpc)

brianwyka commented 3 weeks ago

I've tried looking for a change in the grpc extension which could be related, but nothing is sticking out: https://github.com/quarkusio/quarkus/compare/3.12.3...3.13.0

cescoffier commented 3 weeks ago

Are you using the same server as HTTP or use a separate server? Are you being a proxy?

brianwyka commented 3 weeks ago

I included quarkus.grpc.server.use-separate-server = false config above. No proxy. Communication is from an AWS EC2 to a kubernetes grpc ingress.

cescoffier commented 3 weeks ago

Does it work locally?

cescoffier commented 3 weeks ago

My guess is that it's due to a missing host in the request generating a 400 response.

CC @tsegismont

brianwyka commented 3 weeks ago

It works locally, but with a few differences:

brianwyka commented 3 weeks ago

@cescoffier, do you mean that the client needs to set a host header now? Any way to disable that validation on the server side to verify if that is the issue?

cescoffier commented 2 weeks ago

You cannot disable the validation.

It got reported here: https://github.com/vert-x3/vertx-web/issues/2638

brianwyka commented 2 weeks ago

Was this introduced in 3.13.0?

On Sun, Aug 25, 2024, 07:29 Clement Escoffier @.***> wrote:

You cannot disable the validation.

It got reported here: vert-x3/vertx-web#2638 https://github.com/vert-x3/vertx-web/issues/2638

— Reply to this email directly, view it on GitHub https://github.com/quarkusio/quarkus/issues/42724#issuecomment-2308790722, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK7PMMGTRBTOO54MF2S5443ZTG52FAVCNFSM6AAAAABM7G3UQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYG44TANZSGI . You are receiving this because you authored the thread.Message ID: @.***>

cescoffier commented 2 weeks ago

Yes

cescoffier commented 2 weeks ago

You can try to downgrade Vert.x Web

brianwyka commented 2 weeks ago

Thanks, I'll try that. What would I need to do on the client side in the ManagedChannel Metadata or by some other means to be able to successfully upgrade? (If that's the issue)

cescoffier commented 2 weeks ago

It's only on the server side. Downgrading vert.x on the server side should verify that it's the actual issue.

tsegismont commented 2 weeks ago

My guess is that it's due to a missing host in the request generating a 400 response.

CC @tsegismont

It may be related, indeed. It will be interesting to know if downgrading Vert.x Web to the previous version solves the issue. Also, it would help to get a capture or logs of the network traffic (to confirm the host header is missing).

It is surprising that the host header is missing (I wouldn't expect a gRPC client to use HTTP/1.0 and the HTTP/1.1 mandates the Host header to be present).

cescoffier commented 2 weeks ago

I quickly discussed this issue with @tsegismont and we are not sure it's related to the same issue I mentioned yeterday. We striked us was the invalid content-type: null in your log. I believe it's client-side, but it's not clear.

So as @tsegismont said, we would need the network traffic triggering the issue to understand.

Having a way to reproduce the issue locally would be perfect.

brianwyka commented 2 weeks ago

@cescoffier , there were no updates on the client side, so how can it be a client-side issue?

cescoffier commented 2 weeks ago

I meant the logged message you posted.

brianwyka commented 2 weeks ago

Correct, that logged message is on the client-side calling application.

brianwyka commented 2 weeks ago

@cescoffier , should I still attempt downgrading io.quarkus:quarkus-reactive-routes (formerly Vert.x Web)? I don't see any code differences in that module other than pom changes.

cescoffier commented 2 weeks ago

You need to downgrade vertx-web: Add the following dependency:

<dependency>
    <groupId>io.vertx</groupId>
    <artifactId>vertx-web</artifactId>
    <version>4.5.8</version>
</dependency>

If still the same, do the same with vertx-core.

brianwyka commented 2 weeks ago

@cescoffier , are there and access logs I can enable on the server side to log the traffic? Would it be through this configuration? https://quarkus.io/guides/http-reference#configuring-http-access-logs

cescoffier commented 2 weeks ago

That would be a great start. Make sure the output contains the headers.

brianwyka commented 2 weeks ago

The HTTP access logs did not contain anything for the failed gRPC error.

Reverting vertx dependencies to 4.5.7 did the trick.

cescoffier commented 2 weeks ago

Which Vertx dependency did you downgrade? That will help us finding the source of the bug.

brianwyka commented 2 weeks ago

I downgraded them all to make sure first. I'll do some follow-up testing to narrow it down, starting with vertx-web.

cescoffier commented 2 weeks ago

Thanks! That would be super useful!

brianwyka commented 2 weeks ago

I've downgraded only vert-web to version 4.5.7 and everything is working as expected. Seems the issue is in there or another transitive vertx dependency of web.

cescoffier commented 2 weeks ago

Thanks!

@tsegismont So, it's a change in vert.x web. Looking at the commits (https://github.com/vert-x3/vertx-web/commits/4.5.8/), I believe the host/authority header commit broke this one.

tsegismont commented 2 weeks ago

OK, thanks for digging into this @brianwyka

It means either the gRPC Client or a proxy is sending gRPC traffic over HTTP/1.0 or over HTTP/1.1 without a host header. In both cases, this is wrong.

brianwyka commented 2 weeks ago

Do you know how to correct this on the client side @tsegismont ?

brianwyka commented 1 week ago

Looks like this is fixed in https://github.com/vert-x3/vertx-web/issues/2638 (Vertx 4.5.10). @cescoffier , which version of Quarkus will this be released in?

cescoffier commented 1 week ago

3.14.x (released very soon) and obviously 3.15 (next LTS)