body is set to whole message on gzipped message longer than 10K

Sorry for the non-minimal reproduction of the issue that I found. I still wanted to report it.

What happens:

After about 10k chars parsed of responses with gzip-encoded bodies, the parser will go into a state where it will reliably call the "body" callback with the full message as the body slice, instead of it pointing to just the body. This works if it's a single large message, or if the same parser instance is repeatedly used for messages until this happens.

This is a node.js server impl that allows one to play with the reproduction:

const fs = require('fs')
const Koa = require('koa')
const app = new Koa()
const zlib = require('zlib')
const crypto = require('crypto')
const data = zlib.gzipSync(crypto.randomBytes(1000))

app
    .use(async ctx => {
        ctx.set('Content-Encoding', 'gzip')
        ctx.body = data
    })

My Rust code looked roughly like this:

#[derive(Default)]
pub struct Handler {
    pub complete: bool,
    pub body: Vec<u8>,
}

impl ParserHandler for Handler {
    fn on_body(&mut self, _: &mut Parser, body: &[u8]) -> bool {
        self.body = body.to_vec();
        true
    }

    fn on_message_complete(&mut self, parser: &mut Parser) -> bool {
        self.complete = true;
        parser.pause();
        true
    }
}

pub struct HttpClientCodec {
    parser: Parser,
}

impl HttpClientCodec {
    pub fn new() -> Self {
        Self {
            parser: Parser::response(),
        }
    }
}

pub struct Response {
    pub status: u16,
    pub body: Vec<u8>,
    pub keep_alive: bool,
}

impl Decoder for HttpClientCodec {
    type Item = Response;
    type Error = HttpError;

    fn decode(&mut self, src: &mut BytesMut) -> Result<Option<Self::Item>, Self::Error> {
        let mut h = Handler::default();
        let read = self.parser.parse(&mut h, src);
        if !h.complete {
            return Ok(None);
        }
        self.parser.unpause();
        if self.parser.has_error() && self.parser.error() != "HPE_PAUSED" {
            return Err(HttpError::ResponseParseError {
                error: self.parser.error().into(),
                description: self.parser.error_description().into(),
            });
        }
        // remove bytes read from view
        src.split_to(read);
        Ok(Some(Response {
            status: self.parser.status_code(),
            body: h.body,
            keep_alive: self.parser.should_keep_alive(),
        }))
    }
}

Originally I really liked this parser's API, more than others' that are available. Especially since chunked messages would be handled properly. But on encountering this issue and after an upgrade of the underlying nodejs/http-parser dependency did not yield any improvement, I switched over to the httparse crate. (For anyone finding this via google and wondering about potential solutions. You have to write your own brief logic for figuring out how long the body is, and potentially for chunked encoding if you need that (I didn't).)

nbaksalyar / rust-streaming-http-parser

body is set to whole message on gzipped message longer than 10K #22