Open remexre opened 3 years ago
https://simonsapin.github.io/wtf-8/#motivation for some background
My first thought, without too much careful consideration, was "just do what Java does".
But apparently Java converts unicode escape sequences anywhere in a source file to the equivalent characters, before parsing? https://javajee.com/unicode-escapes-in-java
That seems a bit strange (and would cause issues with locations) - and I don't see a real advantage to doing it this way anyway? So IDK.
Right now, looks like we don't support escapes of the form
"\x12"
,"\u1234"
, and"\U12345678"
; these probably aren't hard to add inunescapeString
andescapeString
, but some thought should be put into what our Unicode guarantees for strings actually are; should we allow the string"\ud800"
, for example? What should we do when trying to write that out to UTF-8, if so?I guess this comes down to, what is a string?