"The algorithm" is inconsistent, buffering nuances

I enjoyed the write-up.

The first algorithm does a table[state][c]; (i.e. table is UCHAR_MAX; and you will be in trouble if the platform uses signed char and your code does it correctly) while the latter is does an indirect lookup via columns table[state][column[c]];. Neither are wrong, of course, but it's confusing that they are inconsistent.

node.js's http_parser uses a callback model with a function call overhead per token. It still has to buffer at least enough to be able to at least emit a token. Buffering all tokens and the whole request is more or less equivalent. You contrast that with "reading the entire header in and buffering it". I think you overlook that Ethernet data arrive in packets so you may very well have the whole header in the first packet. If this is production server you have to apply size constraints, otherwise your parser may do a lot of work only to realize that you are under a denial of service (DoS) attempt. Either do the benchmark, or avoid extrapolating one data set to another case that is not directly comparable.

Whenever I explored the "Pointer arithmetic" point I have found that gcc generated identical assembler. My conclusion was to write what is easier to read.

robertdavidgraham / wc2

"The algorithm" is inconsistent, buffering nuances #5