ninenines / cowlib

Support library for manipulating Web protocols.
ISC License
279 stars 173 forks source link

Receiving intermittent frame size error for `RST_STREAM` frames. #131

Closed beligante closed 1 year ago

beligante commented 1 year ago

Hello Loïc!

I'm working on some improvements for elixir/grpc lib, and I found an "issue" (in quotes, cause It may be considered an improvement) when running some load tests - locally - to check the behavior of the code under a lot of stress.

While running such tests, I started to see randomly the connection being abruptly closed with FRAME_SIZE_ERROR (RFC-7540). After debugging, I discovered that some RST_STREAM frames were being partially sent to the server.

I suspect these issues happened because - based on gen_tcp:send/2 docs - the data was being chunked and partially sent because the OS send buffers are full. That's my best and only guess, because of the whole context of me being load-testing a connection. To add more, I've added some debugs to print out the data, and sometimes it needed one, two, or three bytes in the frame. Examples bellow.

% shape {Error, IncomingData}
{{connection_error,frame_size_error,'RST_STREAM frames MUST be 4 bytes wide. (RFC7540 6.4)'},<<0,0,4,3,0,0,0,2,97,0>>}
{{connection_error,frame_size_error,'RST_STREAM frames MUST be 4 bytes wide. (RFC7540 6.4)'},<<0,0,4,3,0,0,0,0,131,0,0>>}

I made my use-case work by asking for more data instead of returning an error in this line here for cow_http2.erl - But I don't think this is the proper fix since in the real world I don't think that will be that case for everyone right? There's no way to know if data is missing for the frame OR if the data didn't arrive yet. ( there is a TODO in the code from you, so, I suspect you had the same impression)

I was wondering what's the proper solution - or even if this is a problem, I might have misunderstood a like on H2 spec - for this. I could think of some and would be open to working on them. But first, I like to know your thoughts on this.


If you want to test this, I have an example in my fork for elixir/grpc. => https://github.com/beligante/grpc/tree/debug-rst-stream-for-cowboy-and-cowlib

This branch is running with the "fix" I did for this to work. If you change the cowlib version to the latest in here you'll see the errors I mentioned happening and the connection being closed abruptly

essen commented 1 year ago

The clause you linked to in cow_http2 is missing a when Len =/= 4 , it's a little too large in what it matches.

beligante commented 1 year ago

So, for the use case, I described above. I should ensure that gen_tcp doesn't break the RST_STREAM frames in case the OS Buffers are full. Right? Aka: I might have to open an issue for the Mint client

If I'm not mistaken, the previous clause matches for clauses where Len >= 4.

parse(<< 4:24, 3:8, _:9, StreamID:31, ErrorCode:32, Rest/bits >>) ->
    {ok, {rst_stream, StreamID, parse_error_code(ErrorCode)}, Rest};

If it's > 4 it will parse the error and return the rest of the data. If it's == 4 the Rest will be <<>> - That being said, I don't think you need a when guard here

essen commented 1 year ago

Nothing to change in clients. The previous clause doesn't match because Len is the first 24 bits and it's 4 in that clause. We need to make the catch-all a little less catch-all to not match incomplete packets with valid lengths.

beligante commented 1 year ago

Ahhh Gotcha!

So, wrapping up:

essen commented 1 year ago

No need for another clause, it'll end up unmatched until the very last clause which asks for more data.