Question: can frame.into_data() be incomplete?

kristof-mattei commented 11 months ago

I use http-body to parse the body of an endless Transfer-Encoding: Chunked stream.

let frame = response.frame().await.expect("Stream ended").expect("Failed to read frame");

let Ok(data) = frame.into_data() else {
    // frame is trailers, ignored
    continue;
};

let decoded = serde_json::from_slice(&data)?;

// ...

But as I've discovered, under certain conditions data is incomplete. When complete it ends in \n.

To fix it I have a buffer that I only parse out the part of [0..(index of first b'\n'] and remove it from the buffer.

This leaves me with the following questions:

Is this expected behavior from Frame? Having a partial piece in there?
Is the \n a left-over from the Chunked separator \r\n?

davidpdrsn commented 11 months ago

Are you intending to buffer the whole response body? If so, then yes it might contain more than one frame. You can get the whole response using BodyExt::collect:

body.collect().await?.to_bytes()

seanmonstar commented 11 months ago

It's not that it's incomplete, but this a common misconception: writes from a peer do not equal the exact same reads locally. There are multiple things that can make a write get cut up into smaller pieces: TCP segment size, HTTP/2 DATA frame size, TLS record size, proxies/intermediaries.

You essentially want something like read_until(). This requires buffering data, since each "frame" may not contain all the bytes you want.

Enough people have asked about this that it makes me think we could probably come up with a helper in http-body-util.

kristof-mattei commented 11 months ago

Are you intending to buffer the whole response body? If so, then yes it might contain more than one frame. You can get the whole response using BodyExt::collect:
body.collect().await?.to_bytes()

No, the body is endless.

It's not that it's incomplete, but this a common misconception: writes from a peer do not equal the exact same reads locally. There are multiple things that can make a write get cut up into smaller pieces: TCP segment size, HTTP/2 DATA frame size, TLS record size, proxies/intermediaries.

You essentially want something like read_until(). This requires buffering data, since each "frame" may not contain all the bytes you want.

Enough people have asked about this that it makes me think we could probably come up with a helper in http-body-util.

Okay Frame is a lower level than the CRLF-separated Chunk.

Looking at the spec a little bit more: https://en.wikipedia.org/wiki/Chunked_transfer_encoding#Encoded_data it seems that my \n detection is probably not correct and I need to do something a little bit smarter taking the chunk size into account.

kristof-mattei commented 11 months ago

@seanmonstar reading more I think I found where I got confused.

In HTTP2, which doesn't have Chunked, but it has Frames: https://httpwg.org/specs/rfc7540.html#FrameTypes

So the name Frame in http1 shorted my brain.

hyperium / http-body

Question: can frame.into_data() be incomplete? #101