Open MitalAshok opened 3 months ago
Can you add a godbolt link for each example to clarify the behavior, I am not sure I follow the whole issue correctly.
CC @cor3ntin @tahonermann
@llvm/issue-subscribers-clang-frontend
Author: Mital Ashok (MitalAshok)
I could not get \r carriage-returns to work on godbolt (they are just replaced with \n), so I can't show it there.
I'll rewrite the example in the original thread:
int main() { // this line ends with \n\r\
return 1;
}
Generated with:
python3 -c 'open("test.cpp", "wb").write(b"int main() { // this line ends with \\n\\r\\\n\r return 1;\n}\n")'
Clang and MSVC treat the return 1;
as part of the comment on a single line (so it returns 0
) and GCC doesn't (so it returns 1
).
This is one of the few cases where multiple vs a single new line matters.
(Another case is https://cplusplus.github.io/CWG/issues/1709.html / cebac48bf7e52e352b8cda806a64dab66df4c64f for how many \
n
strings are produced when stringizing a raw string)
There looks to be a similar bug in raw string literal parsing:
constexpr const char* s = R"(
)";
With the newline = \n:
Compiler | s[0] |
s[1] |
---|---|---|
Clang | '\n' |
0 |
GCC | '\n' |
0 |
MSVC | '\n' |
0 |
With the newline = \r\n:
Compiler | s[0] |
s[1] |
---|---|---|
Clang | '\n' |
0 |
GCC | '\n' |
0 |
MSVC | '\n' |
0 |
With the newline = \r:
Compiler | s[0] |
s[1] |
---|---|---|
Clang | '\r' |
0 |
GCC | '\n' |
0 |
MSVC | '\n' |
0 |
(MSVC also has "warning C4335: Mac file format detected: please convert the source file to either DOS or UNIX format")
With the newline = \n\r:
Compiler | s[0] |
s[1] |
s[2] |
---|---|---|---|
Clang | '\n' |
'\r' |
0 |
GCC | '\n' |
'\n' |
0 |
MSVC | '\n' |
0 |
GCC seems to have the correct behaviour on all of them
@AaronBallman points out this piece of code: https://github.com/llvm/llvm-project/blob/e46468407a7bb7f8b2fe13675a5a1c32b85f8cad/clang/lib/Lex/Lexer.cpp#L1288-L1293
Previous discussion about this can be found here: https://github.com/llvm/llvm-project/pull/97585/files#r1674111174
Clang currently accepts
\n\r
as a single new-line. I.e., a\
followed by the two characters\n\r
deletes all three of those characters in the physical source to create one logical source line.CWG2639 seems to make it so
\r\n
->\n
and\r
(not followed by\n
) ->\n
, but Clang also converts\n\r
->\n
. This seems nonconformant in C++23.GCC does not treat
\n\r
as a single new-line. MSVC does seem to treat\n\r
as a single new-line.