wren-lang / wren

The Wren Programming Language. Wren is a small, fast, class-based concurrent scripting language.
http://wren.io
MIT License
6.86k stars 550 forks source link

Discussion: Desired behavior for invalid Unicode escapes #816

Open ChayimFriedman2 opened 3 years ago

ChayimFriedman2 commented 3 years ago

You can create invalid Unicode escapes using \U, for example "\UFFFFFFFF".

The current code is problematic from any point of view:

Personally, I tend towards error (if you want raw bytes use \x), but keeping them is also an option. Either way, the choice should be documented.

mhermier commented 3 years ago

Implementation is lazy and validating it is quite expensive. While I agree correctness would be good, I wonder if testing it will not introduce to much code, for even rejecting invalid planes only (since we can't really rely on external code)

ChayimFriedman2 commented 3 years ago

No, just another else here:

https://github.com/wren-lang/wren/blob/45c67fae0c2fbe78b608d372d0951ffe05f02690/src/vm/wren_compiler.c#L840-L845

mhermier commented 3 years ago

I also agree with the arguments. If one really want to introduce binary, \x should be used instead of \u or \U.

This seems to be coherent with what other compiler do. Testing with various languages and compilers (using compiler explorer), it seems that most compiler agree to error at least on code points not being able to be encoded. So I think it is the simplest approach to take.