melt-umn / silver

An attribute grammar-based programming language for composable language extensions
http://melt.cs.umn.edu/silver/
GNU Lesser General Public License v3.0
59 stars 7 forks source link

Hex character escapes #515

Open remexre opened 3 years ago

remexre commented 3 years ago

Right now, looks like we don't support escapes of the form "\x12", "\u1234", and "\U12345678"; these probably aren't hard to add in unescapeString and escapeString, but some thought should be put into what our Unicode guarantees for strings actually are; should we allow the string "\ud800", for example? What should we do when trying to write that out to UTF-8, if so?

I guess this comes down to, what is a string?

remexre commented 3 years ago

https://simonsapin.github.io/wtf-8/#motivation for some background

krame505 commented 3 years ago

My first thought, without too much careful consideration, was "just do what Java does".

But apparently Java converts unicode escape sequences anywhere in a source file to the equivalent characters, before parsing? https://javajee.com/unicode-escapes-in-java

That seems a bit strange (and would cause issues with locations) - and I don't see a real advantage to doing it this way anyway? So IDK.