Closed apparentlymart closed 5 years ago
Ahh, I just remembered why moving the +
doesn't work: it causes an escape like $${
to not actually escape because the pattern ends and the following bytes are ${
and the state about it being an escape are lost. Further investigation required, then...
Given the following input:
The quoted string literal token after the open quote seems to consume the remainder of the line, and all following lines. In other words, it prevents the end pattern
"
from matching and thus prevents the scanner from returning to the normal config scanning mode.This seems to be the result of some faulty assumptions about Ragel. The ability for us to treat isolated
%
and$
introducers (not followed by{
) as quoted literal string parts is implemented by some special sequences:The goal here is to look ahead to see if the byte after the introducer is
{
, and if not to back up one byte so we can emit the$
or%
in isolation and then continue scanning at the subsequent{
.It seems though that somehow then the
"
matches as a string literal character even though that character is specifically excluded from the set of valid string literal characters unless preceded by a\
, as we can see in the full rule:I have not yet determined how the
TemplateStringLiteralRule
is then proceeding to consume the"
, but I do see it as part of the resultingTokenQuotedLit
token along with all of the following bytes on the line.We did already have some tests for
%
and$
appearing right before"
, but all of those existing cases had the"
at the end of the input, and so I suspect that the longest-match-wins rule is playing a part here somehow. I saw some better results by moving the+
modifier at the end of the pattern to instead apply only to the final sub-pattern(StringLiteralChars - ("$" | '%' | '"'))
. That causes a more suitable result for this input, at the expense of causing us to generate multiple consecutiveTokenQuotedLit
when literal$
,%
, and character escapes are present. But the result ought to be functionally equivalent, so maybe that'll be a reasonable path forward to fix this up without significant rework of the grammar.