nbaksalyar / rust-streaming-http-parser

Streaming HTTP parser for Rust, based on nodejs/http-parser
MIT License
51 stars 9 forks source link

body is set to whole message on gzipped message longer than 10K #22

Open seeekr opened 5 years ago

seeekr commented 5 years ago

Sorry for the non-minimal reproduction of the issue that I found. I still wanted to report it.

What happens:

After about 10k chars parsed of responses with gzip-encoded bodies, the parser will go into a state where it will reliably call the "body" callback with the full message as the body slice, instead of it pointing to just the body. This works if it's a single large message, or if the same parser instance is repeatedly used for messages until this happens.

This is a node.js server impl that allows one to play with the reproduction:

const fs = require('fs')
const Koa = require('koa')
const app = new Koa()
const zlib = require('zlib')
const crypto = require('crypto')
const data = zlib.gzipSync(crypto.randomBytes(1000))

app
    .use(async ctx => {
        ctx.set('Content-Encoding', 'gzip')
        ctx.body = data
    })

My Rust code looked roughly like this:

#[derive(Default)]
pub struct Handler {
    pub complete: bool,
    pub body: Vec<u8>,
}

impl ParserHandler for Handler {
    fn on_body(&mut self, _: &mut Parser, body: &[u8]) -> bool {
        self.body = body.to_vec();
        true
    }

    fn on_message_complete(&mut self, parser: &mut Parser) -> bool {
        self.complete = true;
        parser.pause();
        true
    }
}
pub struct HttpClientCodec {
    parser: Parser,
}

impl HttpClientCodec {
    pub fn new() -> Self {
        Self {
            parser: Parser::response(),
        }
    }
}

pub struct Response {
    pub status: u16,
    pub body: Vec<u8>,
    pub keep_alive: bool,
}

impl Decoder for HttpClientCodec {
    type Item = Response;
    type Error = HttpError;

    fn decode(&mut self, src: &mut BytesMut) -> Result<Option<Self::Item>, Self::Error> {
        let mut h = Handler::default();
        let read = self.parser.parse(&mut h, src);
        if !h.complete {
            return Ok(None);
        }
        self.parser.unpause();
        if self.parser.has_error() && self.parser.error() != "HPE_PAUSED" {
            return Err(HttpError::ResponseParseError {
                error: self.parser.error().into(),
                description: self.parser.error_description().into(),
            });
        }
        // remove bytes read from view
        src.split_to(read);
        Ok(Some(Response {
            status: self.parser.status_code(),
            body: h.body,
            keep_alive: self.parser.should_keep_alive(),
        }))
    }
}

Originally I really liked this parser's API, more than others' that are available. Especially since chunked messages would be handled properly. But on encountering this issue and after an upgrade of the underlying nodejs/http-parser dependency did not yield any improvement, I switched over to the httparse crate. (For anyone finding this via google and wondering about potential solutions. You have to write your own brief logic for figuring out how long the body is, and potentially for chunked encoding if you need that (I didn't).)

nbaksalyar commented 5 years ago

Thanks for reporting this! I'll take a look into this and will try to come up with possible fixes.