Open matklad opened 5 years ago
Tentatively classifying as a bug.
I am pretty ignorant about unicode, but I would prefer to fix this the other way around, by restricting whitespace definition in the reference to ASCII. Opened https://internals.rust-lang.org/t/do-we-need-unicode-whitespace/9876 for that discussion
cc @Manishearth
200F is definitely useful in text, we should not be skipping it at all in strings.
As for the lexer: this was one of the questions we punted for later on the non ascii ident story: RLM is useful for having code using RTL scripts that renders well, so having it as allowed whitespace is somewhat useful (if confusing)
Lexer uses Pattern_White_Space unicode property when skipping over trivia. However, when we process string literals with escaped newlines, we only skip ASCII whitespace:
https://github.com/rust-lang/rust/blob/fe0a415b4ba3310c2263f07e0253e2434310299c/src/libsyntax/parse/mod.rs#L379
Here's an example program that shows that U+200F is ignored in program text, but not in the string literal
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ec59778d31dde69f29f1095aff2c9b66
Here's the text of the program in Debug format, to make whitespace slightly more visible