Closed JoeyEremondi closed 8 years ago
Strictly speaking, this issue is invalid, for two reasons:
I'd support adding the approach suggested in the HTML5 spec to jsEscape
--- but only if that doesn't change the semantics: i.e. if parsing "<!--" would yield the same program as parsing "<!--"; same for "<script" as "<\script", and "</script" as "<\/script". It looks like that according to the spec that is, indeed, the case for string and regexp literals (it is permitted to have an escape sequences !, \s and \/ that would evaluate to the respective characters !, s and /), but not for comments. However, since language-ecmascript does not output comments in any version, this should be fine.
:-1: It is your responsibility to safely encode the JS output for inclusion in whichever context you choose.
Also, it is not true that
it is permitted to have an escape sequences !, \s and \/ that would evaluate to the respective characters !, s and /
Michael, focusing on the substantial and informative part of your message: here are my grammar reductions for string and regexp literals. What am I missing?
SingleStringCharacter -> \ EscapeSequence
DoubleStringCharacter -> \ EscapeSequence
RegularExpressionChar ->RegularExpressionBackslashSequence -> \ RegularExpressionNonTerminator
EscapeSequence -> CharacterEscapeSequence -> NonEscapeCharacter -> SourceCharacter but not EscapeCharacter or LineTerminator ('!', 's', '/' are neither in EscapeCharacter nor in LineTerminator)
RegularExpressionNonTerminator -> SourceCharacter but not LineTerminator ('!', 's', '/' are not in LineTerminator)
Paraphrasing ECMA-262 v5, section 7.8.4: EscapeCharacter :: SingleEscapeCharacter | DecimalDigit | x | u SingleEscapeCharacter :: ' | " | \ | b | f | n | r | t | v
I was not refuting your claim that it is possible to have these escape sequences. I was refuting your claim that the semantics would not change in any of the listed contexts if you precede one of the listed characters with a backslash. In a regex context, \s
matches any whitespace character.
An actually safe replacement would be to use the \x##
or \u####
escape sequences. But I still feel it is a consumer's responsibility.
If you do want to go this route, though, note that null bytes are allowed literally in JavaScript strings and comments, but are not allowed in HTML at all, so must be escaped. For this, you can use \0
.
"In a regex context, \s matches any whitespace character." --- true, I missed that. Thank you, @michaelficarra .
It does look like escaping s
in <script
with a unicode escape sequence, while using a character escape in the rest would do the job.
Regarding the consumer responsibility for encoding, I've just had an interesting thought.
In fact, <!--
and <script
can appear outside of literals and comments:
x <!--y
is valid ECMAScriptx <script
is tooSo, it appears that not only the HTML5 recommendation does not guarantee that a script is safe to appear in an inline script tag. Adjusting jsEscape
the way @JoeyEremondi suggests wouldn't either.
Now, I'm actually thinking that this should be done as a post-processing on the whole printed source, unicode escaping the problematic characters !
, s
and /
in substrings <!--
, <script
and </script
. But this is actually independent of the pretty-printer itself since it could ---and, I believe, should--- be done as a String -> String transformation. I could include a function that does this in language-ecmascript
, but it would be the user's responsibility to call it on the printed source that is going to be included in an inline script.
I think the same reasoning applies to source found in inline event handlers and the javascript:
pseudo-protocol URLs --- though I'm fuzzy on the syntactic restrictions there.
Thoughts?
Since I haven't seen any further arguments for including this functionality in language-ecmascript
, I'm closing the issue.
If you have something like
StringLit "<script>alert(\"a\")</script>@emaildomain.con"
, it generate JS that looks like this:When read in a browser, this causes a syntax error, because it reads the
</script>
as the end of the HTML<script>
body.As described in this StackOverflow answer, the following characters should be escaped to avoid interfering with JS or HTML parsers: