sudorandom / fauxrpc

Easily start a fake gRPC/gRPC-Web/Connect/REST server from protobufs
https://fauxrpc.com
MIT License
76 stars 0 forks source link

Inconsistent gRPC-Web behavior #15

Open DazWilkin opened 1 week ago

DazWilkin commented 1 week ago

Using tonic-web with any (but e.g. ElizaService) client.rs (as-is URL changes aside) against FauxRPC:

Error: Status { code: Unknown, message: " ", source: None }

Even though fauxrpc run --addr=localhost:6660 --schema=eliza.binpb:

2024/11/14 11:39:28 INFO MethodCalled service=connectrpc.eliza.v1.ElizaService method=Say
2024/11/14 11:39:28 | 200 |  3.228239993s |    100.65.15.44 | POST     "/connectrpc.eliza.v1.ElizaService/Say"

Reconfiguring the rust client to use https://demo.connectrpc.com:

Response { metadata: MetadataMap { headers: {"content-type": "application/grpc-web", "grpc-accept-encoding": "gzip", "vary": "Origin", "date": "Thu, 14 Nov 2024 19:46:39 GMT", "server": "Google Frontend", "traceparent": "00-6d243811558d7e0f0f15a46b4cefb357-60228e18328eeff4-01", "x-cloud-trace-context": "6d243811558d7e0f0f15a46b4cefb357/6927255411427831796;o=1", "via": "1.1 google", "alt-svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000", "transfer-encoding": "chunked"} }, message: SayResponse { sentence: "How do you feel when you say that?" }, extensions: Extensions }

In the latter case, I'm having to rewire the client to use TLS too but, using tailscale serve --https=443 6660, I can put a TLS proxy in front of FauxRPC too and continue to receive:

Error: Status { code: Unknown, message: " ", source: None }

All the while FauxRPC reports success (200) on the POST /.../ElizaService/Say.

This suggests there's something inconsistent in FauxRPC's behavior that's not present in ConnectRPC.

I've been flummoxed by this.

I used Kreya because it supports gRPC-Web and it works correctly with both the ConnectRPC and FauxRPC endpoints.

Yesterday, I used socat to try and capture the traffic using the tonic-web's helloworld.proto against FauxRPC.

With minor differences, there's no obvious (to me breaking) difference:

client.rs generates the Error:

> 2024/11/13 12:12:49.000320511  length=171 from=0 to=170
POST /helloworld.Greeter/SayHello HTTP/1.1\r
te: trailers\r
content-type: application/grpc-web\r
host: localhost:12345\r
transfer-encoding: chunked\r
\r
E\r
....    
\aFreddie\r
0\r
\r
< 2024/11/13 12:12:51.000759623  length=145 from=0 to=144
HTTP/1.1 200 OK\r
Content-Type: application/grpc-web+proto\r
Date: Wed, 13 Nov 2024 20:12:51 GMT\r
Transfer-Encoding: chunked\r
\r
e\r
....    
\aFreddie\r
< 2024/11/13 12:12:51.000760926  length=48 from=145 to=192
25\r
.... Grpc-Message: \r
Grpc-Status: 0\r
\r
0\r
\r

Kreya responds with Ok:

> 2024/11/13 12:14:40.000795916  length=216 from=0 to=215
POST /helloworld.Greeter/SayHello HTTP/1.1\r
Host: localhost:12345\r
TE: trailers\r
grpc-accept-encoding: identity,gzip,deflate\r
Transfer-Encoding: chunked\r
Content-Type: application/grpc-web\r
\r
E\r
....    
\aFreddie\r
0\r
\r
< 2024/11/13 12:14:42.000968834  length=145 from=0 to=144
HTTP/1.1 200 OK\r
Content-Type: application/grpc-web+proto\r
Date: Wed, 13 Nov 2024 20:14:42 GMT\r
Transfer-Encoding: chunked\r
\r
e\r
....    
\aFreddie\r
< 2024/11/13 12:14:42.000970169  length=48 from=145 to=192
25\r
.... Grpc-Message: \r
Grpc-Status: 0\r
\r
0\r
\r
DazWilkin commented 1 week ago

As an aside, I believe, while gRPC-Web is supported by FauxRPC, CORS is not. Is that correct?

The consequence is that gRPC-Web clients except browsers are supported.

sudorandom commented 1 week ago

I don't have time to dig deeply into this right this moment, but I can throw a couple suggestions:

I can easily add CORS support in to make this easier. You could theoretically use a load balancer to serve up a UI and FauxRPC with the same host. In which case, CORS doesn't really matter or you could have the load balancer serve up the CORS headers... but I can see how that can be extra work, so this is a good feature to add directly to FauxRPC. Can you make a different github issue for that to track it separately?

DazWilkin commented 1 week ago

Thanks for the feedback and suggestions; I'll give them a whirl.

I'm running the Web server and FauxRPC on localhost; CORS rears its head because of the different ports, I'd thought?

I'll open an issue for CORS support.

sudorandom commented 6 days ago

From looking at wireshark output, there's a hint of what is happening.

Here is what it looks like when using the rust server:

Screenshot 2024-11-16 at 08 15 48

Here is FauxRPC:

Screenshot 2024-11-16 at 08 17 04

Notice that there's one more packet with FauxRPC. That's because FauxRPC is sending the trailers after it sends the data. For whatever reason the implementation is splitting the response message and the trailers into multiple packets. It appears like tonic doesn't handle this very well. I think both scenarios are valid, so I'm leaning on this being an issue with the client side of tonic_web.

sudorandom commented 6 days ago

I'm out of my depth with rust, but I believe we might learn more about this issue from debugging this poll_frame function: https://github.com/hyperium/tonic/blob/master/tonic-web/src/call.rs#L258. This reads to me like it is supposed to be reading frames until it reads the trailers... but it appears like something is off either with the logic or the data.

sudorandom commented 6 days ago

I've found the issue. tonic-web appears to be parsing the trailers wrong in some cases... it's seeing an extra space with FauxRPC. This is basically what the trailers look like from FauxRPC:

grpc-message: 
grpc-status: 0

The tonic_web server does this:

grpc-status:0

You don't see it in your output because it's not actually printing the raw trailers.

Note the spaces after the separating colon :. These spaces are dictated by the grpc-web protocol spec so the tonic-web gRPC-Web client should handle the space after the colon. I think many client implementations deal with the space being absent or present, which why this flaw may have gone unnoticed. The tonic-web client is also likely mostly tested exclusively with tonic-web server, and those both don't use a space there. Again, I'm not a rust dev, so I'm not sure how effective I can be at making a patch to tonic-web 😅 I may try to attempt the fix if I find the time 😅

Thanks for this issue though, this has been a fun learning experience! 🦀

sudorandom commented 6 days ago

I filed a PR for tonic. From my testing it fixes your issue 😀

DazWilkin commented 5 days ago

Wow!

Thank you very much. I saw the PR. I'll try it out.

It makes sense that they're only testing against Rust (tonic) servers.

Does this explain why the Rust client worked with Connect's public Eliza service?

sudorandom commented 5 days ago

Because this is only an issue with HTTP/1.1 and I think the Rust client would connect using HTTP/2 with demo.connectrpc.com. HTTP/2 has binary framing which is much more strict so this isn't an issue there. The usual behavior for HTTP clients is to never use HTTP/2 on non-TLS connections.

This, however, can be forced by using a feature called "h2c" (sometimes referred to as http2 prior knowledge). It does look like tonic supports it... but I'm not going to lie, it looks oddly complicated to use... To test this fully you'd want to take the h2c example and adapt it to use grpc-web instead of just gRPC (because this trailer parsing logic is exclusive to gRPC-Web).

sudorandom commented 5 days ago

Oh wait, I'm actually wrong about HTTP/2 here. Even with HTTP/2, gRPC-Web trailers are passed in the body... so your question still stands at why the behavior is different.

I will note that FauxRPC actually uses Vanguard to enable gRPC/gRPC-Web/Connect/REST protocols... So the implementation is actually different. gRPC-Web trailers may be different between ConnectRPC and Vanguard.