ninenines / cowlib

Support library for manipulating Web protocols.
ISC License
281 stars 172 forks source link

HTTP/2 frame parsing/building compliance changes #52

Closed potatosalad closed 6 years ago

potatosalad commented 7 years ago

This pull request makes changes to cow_http2 to fix some minor issues with the parse/1 function, returns {more, Len} instead of more when more bytes are needed (as mentioned here), and adds several build related functions to allow specifying all of the flags and fields mentioned in RFC 7540.

Summary of changes

Notes

  1. Backwards compatibility is mostly maintained, with the exception of iolist return types and the {more, Len} parsing change.
  2. The split_*/3 and split_*/4 are for specifying the max field size, with the default being 16,384 as specified in RFC 7540. The split_*/3 variant may not be necessary.
  3. The frame_*/2 and frame_*/3 functions are the low-level building functions that allow you to create frames that may be invalid, oversized, etc.
  4. The split functions were designed so that iolist_to_binary only gets called when necessary to ensure each frame is not larger than the max frame size. Most of the functions return an iolist instead of a binary.
  5. Flags and fields are specified with a map. Using a tuple/record was the other consideration.
essen commented 7 years ago

That's too big, I'll never be able to review this in a timely manner. Can we go one small change at a time? Or at least make the different changes separate commits so I can review/merge the good quickly and leave feedback on the rest.

potatosalad commented 7 years ago

@essen I apologize for the initial squashed commit. I've tried to split things into individual commits per your request, but please let me know if you'd prefer anything else changed.

essen commented 7 years ago

Thanks. But this https://github.com/ninenines/cowlib/pull/52/commits/382e3e13c8e988cffb89af709e842dc781932778 should be 5 different commits also. I need to review them individually because I need to ensure at the same time that Cowboy will have a corresponding test in rfc7540_SUITE.

potatosalad commented 7 years ago

@essen I have split that specific commit into 5 different commits as requested. Let me know if you need anything else changed.

essen commented 7 years ago

Great, should be good for now. I will try to begin working on it tomorrow. Thanks!

essen commented 7 years ago

I didn't read everything in details yet, but I have a few comments.

I know I put a todo about splitting data into multiple data frames inside Cowlib, but this is definitely not the right place to do it. The maximum frame size is configurable by the other endpoint (and subject to a limit on the server-side also) so this should be taken care of by Cowboy.

I also do not quite understand why we now have two functions for building frames, and why the existing functions should create two maps that are then immediately read to build binaries, instead of building binaries directly. This code should be as efficient as possible, and this certainly is not.

The few commits that fix issues will be merged when I add the corresponding tests to rfc7540 in Cowboy. Thanks!

potatosalad commented 7 years ago

Do you think it would make more sense to have something like this, which is similar to how the parse function works?

-spec data(integer(), nopad | {pad, integer()}, nofin | fin, iodata(), integer()) ->
      {more, Frame :: iodata(), Rest :: iodata()} | {nomore, Frame :: iodata()}.
data(StreamID, Padded, IsFin, Data, MaxFrameSize) ->
    ...

That way cowboy can just call cow_http2:data/5 multiplie times (until {nomore, ...} is returned) and send each created frame until everything has been sent.

essen commented 7 years ago

No because while this would work in a naive implementation (like what we have right now), ultimately Cowboy will need to implement a scheduler of sorts to allocate resources efficiently and prioritize what stream needs to receive data first. When we have that, we won't want to build all data frames immediately, we will want to build them right before sending them, one at a time.

potatosalad commented 7 years ago

Also, the multiple functions were to help maintain backwards compatibility, but it was probably unnecessary. The maps were to help simplify passing optional flags, like for HEADERS. The non-map function spec might look like this:

-spec headers(StreamID, FlagEndStream, FlagEndHeaders, FlagPadded, FlagPriority, HeaderBlockFragment, MaxFrameSize) ->
          {more, Frame :: iodata(), Rest :: iodata()} | {nomore, Frame :: iodata()}
          when
              StreamID :: integer(),
              FlagEndStream :: nofin | fin,
              FlagEndHeaders :: head_nofin | head_fin,
              FlagPadded :: nopad | {pad, integer()},
              FlagPriority :: nopriority | {priority, exclusive | shared, StreamDependency :: integer(), Weight :: integer()},
              HeaderBlockFragment :: iodata(),
              MaxFrameSize :: integer().
potatosalad commented 7 years ago

When we have that, we won't want to build all data frames immediately, we will want to build them right before sending them, one at a time.

I think you and I are in agreement here, but wouldn't having cowlib tell cowboy "we can't fit all of this frame into a single frame, we've returned what is left to be sent" and then cowboy can decide to put the "rest" back into the scheduler/queue so it can continue sending frames that have higher priority or something like that. Otherwise cowboy will need to have really intimate knowledge about, for example: data that is 4 bytes, padding that is 2 bytes, and max frame size of 6 will still need to be spilt into 2 frames because 1 byte gets added to the frame to store the pad length.

essen commented 7 years ago

There's no real compatibility to be kept as there are no released version using this code.

I'm also not saying there shouldn't be maps anywhere, just that you went a bit over the top (an obvious example is ping and ping_ack).

Headers is different than data, for headers we indeed might want to return the headers and continuations all at the same time. They must be sent with nothing interleaved anyway so even if they are multiple frames they act as if it was just one.

We also don't want to support everything just yet. For example padding is a feature I am not willing to implement at the moment, because even the spec is not convinced that it's actually helpful (in fact it says it might be harmful).

This means that there only need to be two variants of the headers function, the one we currently have with a max length argument added (we can keep the current function aliasing with 16384 as default max length, and put a todo for later removal), and one also with a priority.

essen commented 6 years ago

Closing, thanks!