Fix assertion failure when tag follows UTF-8 BOM

rubys / nokogumbo

A Nokogiri interface to the Gumbo HTML5 parser.

Apache License 2.0

186 stars 114 forks source link

Fix assertion failure when tag follows UTF-8 BOM #159

Closed stevecheckoway closed 3 years ago

stevecheckoway commented 3 years ago

The Gumbo tokenizer assumed that the start of the first token is at the beginning of the input. This is not the case if the input starts with a UTF-8 byte-order mark.

This change removes that assumption by asking the iterator itself for the pointer to the start of the token.

Fixes #157