nodejs / http-parser

http request/response parser for c
MIT License
6.32k stars 1.53k forks source link

Suggestion for making F_SKIPBODY more useful #505

Open pkholland opened 4 years ago

pkholland commented 4 years ago

The current usage of the F_SKIPBODY bit in parser->flags is documented as having to do with parsing HEAD responses. Those will generally have a content-length field, but no matching body. So one's implementation of on_headers_complete can return a 1, telling the parser that you want it to skip attempting to parse the message body. When you return 1 from this function it sets this bit in parser->flags (currently line 1851 in http_parser.c).

But there are other reasons that one might want to use the parser to parse just the headers and then do something else for the body. The current behavior when F_SKIPBODY gets set is strange, and somewhat unpredictable. After F_SKIPBODY is set you quickly get to line 1886 which resets the parser state to NEW_MESSAGE, and then continues on with the main loop. If the original call to http_parser_execute happened to have read and passed more data than where the body starts, the main parsing loop will then attempt to parse the body of the message as if it were some kind of next message. This frequently fails on the first byte because it doesn't match any of the legal G, P, D, etc... bytes that can start a message, and in that case doesn't consume that byte. In those cases http_parser_execute returns in a non-errored state and its return value tells the caller where the message body starts. But when the message body starts with a G (and not followed by E), for example, the parser consumes that first G and returns an offset to the second byte of the message body.

I feel like it is much more sensible if the code was something like:

1885        if (parser->flags & F_SKIPBODY) {
1886          UPDATE_STATE(NEW_MESSAGE());
1887          CALLBACK_NOTIFY(message_complete);
1888          len = (p - data) + 1;    // <--- add this line to stop the parsing loop

That would declare that if on_headers_complete returns 1, the parser will not attempt to read or parse any data past the end of the headers.

Thoughts?