stencilproject / Stencil

Stencil is a simple and powerful template language for Swift.
https://stencil.fuller.li
BSD 2-Clause "Simplified" License
2.34k stars 224 forks source link

Fix for incorrect tokenization due to index difference of Unicode character/scalar #286

Closed andreasley closed 4 years ago

andreasley commented 4 years ago

Probably fixes #276 (does for my case).

Scanner iterates unicode scalars but would previously split strings by character index. Since multiple unicode scalars might be combined into a single character, this could lead to incorrect tokenization and therefore unknown tags. This patch changes Scanner to use the indexes of the respective UnicodeScalarView.

All tests pass and performance is basically unchanged.

Bigger picture: I wonder if Unicode.Scalar should be used at all in Lexer/Scanner, as splitting is conceptually always done by character.