Open jorendorff opened 5 years ago
Affected jit-tests:
JS allows unpaired surrogates, and the test suite naturally loves to hit this corner case. We need the ability to pass non-UTF16 JS strings from Visage to C++.
I am paraphrasing what we wrote in the chat.
We currently have strings implemented via rust's &str
. This represents strings as utf8. However, javascript strings are not utf8, they are instead a Vec
Unfortunately we can't rely on str in this case, as we need to accept invalid utf16. So, we need to implement the JavaScript String type as specified.
Some pseudo code to get the idea across, it might look something like enum JsString<'a> { Borrowed(&'a str), Owned(String), Owned16(Vec<u16>) }
, but not quite. This will take some work in lexer.rs.
I think I have a clear idea of this now and can get started. If something isn't quite right here please correct me.
Not actively working on this right now
This sort of thing is valid and actually pretty common:
"(?:[\uD800-\uDBFF][\uDC00-\uDFFF]|[\0-\uFFFF])"