seanmonstar / httparse

A push parser for the HTTP 1.x protocol in Rust.
https://docs.rs/httparse
Apache License 2.0
588 stars 115 forks source link

Improve API #18

Open jethrogb opened 8 years ago

jethrogb commented 8 years ago

I would like to build a function

fn read_headers<'a,'b,R: BufRead>(
        stream: &mut R,
        buf: &'b mut Vec<u8>,
        headers: &'a mut [Header<'b>]
    ) -> Result<Request<'a,'b>,E>

that reads as much from stream into buf as necessary to get a Complete return from Request::parse. This turns out to not be trivial and require lots of extra allocations and work.

Here's what I came up with:

fn read_headers<'a,'b,R: BufRead>(clnt: &mut R, buf: &'b mut Vec<u8>, headers: &'a mut [Header<'b>]) -> Result<Request<'a,'b>,String> {
    fn extend_and_parse<R: BufRead>(clnt: &mut R, headers: &mut [Header]) -> Result<Vec<u8>,String> {
        let mut buf=Vec::<u8>::new();
        let len=headers.len();
        loop {
            let buf_orig_len=buf.len();
            let additional_len={
                let additional=try!(clnt.fill_buf().map_err(|e|e.to_string()));
                buf.extend_from_slice(additional);
                additional.len()
            };
            let mut headers=Vec::with_capacity(len);
            headers.resize(len,httparse::EMPTY_HEADER);
            let mut req=Request::new(&mut headers);
            match req.parse(&buf) {
                Ok(httparse::Status::Complete(n)) => {
                    clnt.consume(n-buf_orig_len);
                    break
                },
                Ok(httparse::Status::Partial) => {
                    clnt.consume(additional_len);
                }
                Err(e) => return Err(format!("HTTP parse error {:?}",e)),
            };
        }
        Ok(buf)
    }

    let result=extend_and_parse(clnt,headers);
    result.map(move|nb|{
        ::core::mem::replace(buf,nb);
        let mut req=Request::new(headers);
        req.parse(buf);
        req
    })
}

The main issues are having to allocate a new array of temporary headers for every iteration, and having to parse the succesful result twice.

I think this is partially Rust's fault, but also partially httparse for having a not so great API. For example, the lifetime of everything is fixed upon Request creation, so that a parse failure doesn't release the borrow of the input buffer.

seanmonstar commented 8 years ago

I agree that the lifetime situation feels difficult. If I could fix anything, I'd fix the feeling of trying to use httparse.

As for your specific example, I don't believe allocationg a Vec of headers is necessary, unless you need to at runtime decide how many is the maximum allowed.

jethrogb commented 8 years ago

As for your specific example, I don't believe allocationg a Vec of headers is necessary, unless you need to at runtime decide how many is the maximum allowed.

You need a buffer that's as big as the input slice headers. You can't do this statically until we get integer generics.

seanmonstar commented 8 years ago

Sure. Who provides the headers slice? I'm actually quite curious at this design, because I haven't experienced it both in sync hyper and async hyper, and I haven't seen others use it that way in other mini http implementations.

For reference, this is how hyper 0.9 would keep reading into a buffer synchronously and try to parse again: https://github.com/hyperium/hyper/blob/0.9.x/src/http/h1.rs#L869-L899

jethrogb commented 8 years ago

I'm actually quite curious at this design, because I haven't experienced it both in sync hyper and async hyper, and I haven't seen others use it that way in other mini http implementations.

For reference, this is how hyper 0.9 would keep reading into a buffer synchronously and try to parse again: https://github.com/hyperium/hyper/blob/0.9.x/src/http/h1.rs#L869-L899

This looks like it basically runs the same loop as me above, except with a fixed max headers, which means you're exactly avoiding the issues I'm running into. I'd like the caller to specify the max headers as well as avoid large additional allocations (either on the heap or stack) during the parsing. Also I'd like to keep the httparse Request representation, in hyper it you're just interpreting the Request directly before returning.

seanmonstar commented 8 years ago

Been thinking about this, what if Headers were not an array of slices, but array of offsets? It'd remove the lifetimes, but require you to slice the buffer yourself with the offsets to get the name and value.

seanmonstar commented 8 years ago

Another option is to make the parse function to update the lifetime of the Request. This might require an unsafe transmute, which would require that parse ensures all of the fields of Request, include the headers array, are reset, to prevent any dangling references.

jethrogb commented 8 years ago

What about merging new and parse into one associated function?

fn parse(headers: &'h mut [Header<'b>], buf: &'b [u8]) -> Result<(usize,Request<'h, 'b>)>

You'd need to change the Status enum as well...

seanmonstar commented 8 years ago

That could be done, but it still ties the lifetime of the Header slice to the buf, and it makes the Status trickier, as you said.

I think a reset() method can be added to Request, that will empty all the fields and reset the Header slice, and update it's lifetime to allow a new buffer to be used.

adriangb commented 2 years ago

I wrote a little Python wrapper for this library. From the perspective of writing a Python wrapper with PyO3 the lifetime on Headers is problematic.

If you're using the flow of "try to parse, if not tell the Python caller to come back with more data" you have to re-allocate headers every time because you can't persist anything with a Header in it (including the buffer) to Python because of the lifetime (PyO3 can't hand off lifetimes to Python).

The lifetime requirement on the data is not a problem, PyO3 let's you access the underlaying memory without copying stuff.