hyperium / http-body

Asynchronous HTTP body trait
MIT License
129 stars 50 forks source link

Question: can frame.into_data() be incomplete? #101

Open kristof-mattei opened 11 months ago

kristof-mattei commented 11 months ago

I use http-body to parse the body of an endless Transfer-Encoding: Chunked stream.

let frame = response.frame().await.expect("Stream ended").expect("Failed to read frame");

let Ok(data) = frame.into_data() else {
    // frame is trailers, ignored
    continue;
};

let decoded = serde_json::from_slice(&data)?;

// ...

But as I've discovered, under certain conditions data is incomplete. When complete it ends in \n.

To fix it I have a buffer that I only parse out the part of [0..(index of first b'\n'] and remove it from the buffer.

This leaves me with the following questions:

davidpdrsn commented 11 months ago

Are you intending to buffer the whole response body? If so, then yes it might contain more than one frame. You can get the whole response using BodyExt::collect:

body.collect().await?.to_bytes()
seanmonstar commented 11 months ago

It's not that it's incomplete, but this a common misconception: writes from a peer do not equal the exact same reads locally. There are multiple things that can make a write get cut up into smaller pieces: TCP segment size, HTTP/2 DATA frame size, TLS record size, proxies/intermediaries.

You essentially want something like read_until(). This requires buffering data, since each "frame" may not contain all the bytes you want.

Enough people have asked about this that it makes me think we could probably come up with a helper in http-body-util.

kristof-mattei commented 11 months ago

Are you intending to buffer the whole response body? If so, then yes it might contain more than one frame. You can get the whole response using BodyExt::collect:

body.collect().await?.to_bytes()

No, the body is endless.

It's not that it's incomplete, but this a common misconception: writes from a peer do not equal the exact same reads locally. There are multiple things that can make a write get cut up into smaller pieces: TCP segment size, HTTP/2 DATA frame size, TLS record size, proxies/intermediaries.

You essentially want something like read_until(). This requires buffering data, since each "frame" may not contain all the bytes you want.

Enough people have asked about this that it makes me think we could probably come up with a helper in http-body-util.

Okay Frame is a lower level than the CRLF-separated Chunk.

Looking at the spec a little bit more: https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Encoded_data it seems that my \n detection is probably not correct and I need to do something a little bit smarter taking the chunk size into account.

kristof-mattei commented 11 months ago

@seanmonstar reading more I think I found where I got confused.

In HTTP2, which doesn't have Chunked, but it has Frames: https://httpwg.org/specs/rfc7540.html#FrameTypes

So the name Frame in http1 shorted my brain.