Open ehuss opened 5 years ago
https://github.com/rust-lang/rust/pull/118699#issuecomment-1852867466 should be helpful.
~The current description says that forms like 'a'b
are acceptable as a BYTE_LITERAL
with a suffix, but in fact they're rejected (to avoid confusion with two LIFETIME_LABEL
tokens).~
The current description says that forms like 'ab'c
are acceptable as two LIFETIME_LABEL
tokens, but in fact they're rejected ("character literal may only contain one codepoint"; the c
is taken as a suffix).
Perhaps this could be documented via another reserved form.
A form like b"\u{00a0}"
is rejected at lexing time ("unicode escape in byte string").
But as it doesn't match either BYTE_STRING_LITERAL
or RESERVED_TOKEN_DOUBLE_QUOTE
, the current description says there's a valid tokenisation as the identifier b
followed by "\u{00a0}"
.
So if we keep on with the current mechanism for documenting such rejected tokens, I think we'd need yet more reserved forms.
There are probably other similar cases. I think after rust-lang/rust#119172 a
C string literal containing a NUL
is one.
There are multiple issues here. Some of this has changed in 1.37 via https://github.com/rust-lang/rust/pull/60793.
[x]
RAW_BYTE_STRING_LITERAL
no longer allows bare CR (new 1.37). #1459[x] "Raw string" and "raw byte string" needs to be updated that CRLF is converted to LF (new 1.37). #1459
[ ] Several tokens need to sync the English text with the "Lexer" definition.
STRING_LITERAL
indicates several rules (like isolated CR's are not allowed), but the text does not mention any of those restrictions.CHAR_LITERAL
says "single Unicode character…except U+0027" which is not complete.RAW_STRING_LITERAL
does not allow bare CR's.BYTE_LITERAL
escapes are not described.BYTE_STRING_LITERAL
restrictions are not described.[x] Typo in
RAW_BYTE_STRING_CONTENT
, points toRAW_STRING_CONTENT
when it should beRAW_BYTE_STRING_CONTENT
. #818[x] I cannot find anywhere that mentions CRLF in a string is converted to LF. Am I blind? #1459
[x] The description for string continuations says "
\
immediately beforeU+000A
", but it can also be before CRLF. How should this be handled? I haven't looked at how it is implemented, but are all CRLF's translated everywhere? Should there just be a blanket statement somewhere about this, to avoid having to discuss it in every string literal definition? #1459I may be missing some things here. Need to very thoroughly review everything to make sure it is correct and up-to-date with the changes from 60793.