validator / htmlparser

The Validator.nu HTML parser https://about.validator.nu/htmlparser/
Other
56 stars 26 forks source link

Enable control for tokenizer buffer size #34

Closed sideshowbarker closed 3 years ago

sideshowbarker commented 4 years ago

This change adds an optional bufferSize parameter to the tokenize() parse() and parseFragment() methods of sax.HtmlParser instances. That bufferSize parameter controls the size of the buffer which gets fed to the tokenizer.

The control provided by that parameter allows the tokenizer buffer to be set, for example, to 1 — and that is particularly useful for emulating the behavior of the Firefox HTML parser, which feeds the tokenizer one single code unit at a time.

Otherwise, without this change, the tokenizer buffer size for HtmlParser instances is hardcoded to 2048.

hsivonen commented 3 years ago

I pushed this via the command line as 9ffe6b967b517baf42cdbec2daea03405ea4a3d1 and now I don't know how to make the GitHub PR metadata say merged instead of closed.

Thanks and sorry about the bad metadata state.