rubys / nokogumbo

A Nokogiri interface to the Gumbo HTML5 parser.
Apache License 2.0
186 stars 114 forks source link

Simplify temporary buffer usage #110

Closed stevecheckoway closed 6 years ago

stevecheckoway commented 6 years ago

Previously, we would insert characters into the temporary buffer for two reasons:

  1. To keep track of text we'd seen and moved beyond but had not emitted as character tokens in order to emit them later; and
  2. Record strings for comments and doctypes.

Use 1 was a bit silly because in order to get the correct token positions, we would mark the input stream when clearing the temporary buffer, reset the input to the mark, and then advance the input and a pointer into the temporary buffer in lock step, emitting character tokens.

Now we just mark the input stream, and then begin emitting from the mark point as needed.

This has the advantage that it frees us to use the temporary buffer for recording the escaped script tag rather than the _script_data_buffer which is now removed.