rust-lang / reference

The Rust Reference
https://doc.rust-lang.org/nightly/reference/
Apache License 2.0
1.24k stars 483 forks source link

Character and string token definitions need updating. #626

Open ehuss opened 5 years ago

ehuss commented 5 years ago

There are multiple issues here. Some of this has changed in 1.37 via https://github.com/rust-lang/rust/pull/60793.

I may be missing some things here. Need to very thoroughly review everything to make sure it is correct and up-to-date with the changes from 60793.

ehuss commented 5 years ago

See also https://github.com/rust-lang/rust/issues/62865

mattheww commented 9 months ago

https://github.com/rust-lang/rust/pull/118699#issuecomment-1852867466 should be helpful.

mattheww commented 9 months ago

~The current description says that forms like 'a'b are acceptable as a BYTE_LITERAL with a suffix, but in fact they're rejected (to avoid confusion with two LIFETIME_LABEL tokens).~

The current description says that forms like 'ab'c are acceptable as two LIFETIME_LABEL tokens, but in fact they're rejected ("character literal may only contain one codepoint"; the c is taken as a suffix).

Perhaps this could be documented via another reserved form.

mattheww commented 9 months ago

A form like b"\u{00a0}" is rejected at lexing time ("unicode escape in byte string").

But as it doesn't match either BYTE_STRING_LITERAL or RESERVED_TOKEN_DOUBLE_QUOTE, the current description says there's a valid tokenisation as the identifier b followed by "\u{00a0}".

So if we keep on with the current mechanism for documenting such rejected tokens, I think we'd need yet more reserved forms.

There are probably other similar cases. I think after rust-lang/rust#119172 a C string literal containing a NUL is one.