llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.13k stars 12.01k forks source link

C18 standard conformance non-compliance. #41571

Open llvmbot opened 5 years ago

llvmbot commented 5 years ago
Bugzilla Link 42226
Version 6.0
OS Linux
Reporter LLVM Bugzilla Contributor
CC @dwblaikie,@DougGregor,@hubert-reinterpretcast,@zygoloid,@TNorthover

Extended Description

This report is a technical point more than anything. Both the C90 and C18 standards have a "conformance chapter", chapter 4 on C18, that states the following on section 1:

"In this International Standard, “shall” is to be interpreted as a requirement on an implementation or on a program; conversely, “shall not” is to be interpreted as a prohibition"

The technical point is the following, in section 5.1.1.2 in C18 , there is a similar section in C90, phase 2 states the following:

"Each instance of a backslash character () immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place."

Consider the following program:

simple_case.c

int main(){ return 0; }
\

clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin

gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The clang version above compiles it just fine, and the gcc version above issues the following warning:

"c.c:2:1: warning: backslash-newline at end of file \

"

If instead we have:

elaborate_case.c " int main(){ return 0; } \

"

then they both compile fine without warning.

Question: according to the standard "A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.", are we not required by the standard to treat such a text file, the elaborate_case.c, as not a "source file"? That is, it should not compile.

llvmbot commented 5 years ago

@​Tim: Per the standard,

Quote 1: "A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place."

By construction of the standard, i.e. by the statement above, a source file must have that property. If it does not, it is not a source file, as a source file shall have that property.

Second point, that "shall" is in fact a requirement on the "implementation", which chapter 5 of the standard defines as:

"An implementation translates C source files..."

Furthermore, the clause I cited(Quote1) above which we are not complying with falls under "The precedence among the syntax rules of translation is specified by the following phases:", hence it definitely is a "shall" that the compiler/implementation enforces if we were to follow the standard.

As for making it a benign warning, as a matter of being pragmatic it makes sense. However, if we are to ignore it in the implementation I wonder why it is a requirement in the standard. If we are not enforcing it, why not remove it, or relax the condition.

@​hstong: I just checked:

00000000 69 6e 74 20 6d 61 69 6e 28 29 7b 20 72 65 74 75 |int main(){ retu| 00000010 72 6e 20 30 3b 7d 0a 5c 0a |rn 0;}..| 00000019

The above produces the warning in gcc, and compiles fine in clang, as I described earlier.

00000000 69 6e 74 20 6d 61 69 6e 28 29 7b 20 72 65 74 75 |int main(){ retu| 00000010 72 6e 20 30 3b 7d 0a 5c 0a 0a |rn 0;}...| 0000001a

Compiles fine on both.

@​Richard: thank you for checking.

ec04fc15-fa35-46f2-80e1-5d271f2ef708 commented 5 years ago

I can reproduce this. Here's what I see: a source file ending in:

The second case is an accepts-invalid bug.

hubert-reinterpretcast commented 5 years ago

I would suggest actually inspecting the file to determine if there is a newline character at the end of the file that is not preceded by backslash.

The following version of elaborate_case.c reproduces the behaviour you observed.

od -A x -t x1 <elaborate_case.c 000000 69 6e 74 20 6d 61 69 6e 28 29 7b 20 72 65 74 75 000010 72 6e 20 30 3b 20 7d 0a 5c 0a 0a 00001b Return: 0x00:0

Notice that there is a newline (0x0A) character not preceded by a backslash (0x5C) at the end of the file.

TNorthover commented 5 years ago

It would still be a source file officially (just by virtue of ending up as input to the compiler, I think) but the C++ standard would call it "ill-formed". C uses less sophisticated terminology.

Either way, we probably should diagnose it to be helpful but it's not a strict requirement (that "shall" is a requirement on the user, not the compiler). And we'd probably make it a warning by default (like GCC) rather than an error because it's pretty benign.

llvmbot commented 1 year ago

@llvm/issue-subscribers-c11

llvmbot commented 4 months ago

@llvm/issue-subscribers-clang-frontend

Author: None (llvmbot)

| | | | --- | --- | | Bugzilla Link | [42226](https://llvm.org/bz42226) | | Version | 6.0 | | OS | Linux | | Reporter | LLVM Bugzilla Contributor | | CC | @dwblaikie,@DougGregor,@hubert-reinterpretcast,@zygoloid,@TNorthover | ## Extended Description This report is a technical point more than anything. Both the C90 and C18 standards have a "conformance chapter", chapter 4 on C18, that states the following on section 1: "In this International Standard, “shall” is to be interpreted as a requirement on an implementation or on a program; conversely, “shall not” is to be interpreted as a prohibition" The technical point is the following, in section 5.1.1.2 in C18 , there is a similar section in C90, phase 2 states the following: "Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place." Consider the following program: simple_case.c " int main(){ return 0; } \ " clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. The clang version above compiles it just fine, and the gcc version above issues the following warning: "c.c:2:1: warning: backslash-newline at end of file \ " If instead we have: elaborate_case.c " int main(){ return 0; } \ " then they both compile fine without warning. Question: according to the standard "A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.", are we not required by the standard to treat such a text file, the elaborate_case.c, as not a "source file"? That is, it should not compile.
llvmbot commented 4 months ago

@llvm/issue-subscribers-c17

Author: None (llvmbot)

| | | | --- | --- | | Bugzilla Link | [42226](https://llvm.org/bz42226) | | Version | 6.0 | | OS | Linux | | Reporter | LLVM Bugzilla Contributor | | CC | @dwblaikie,@DougGregor,@hubert-reinterpretcast,@zygoloid,@TNorthover | ## Extended Description This report is a technical point more than anything. Both the C90 and C18 standards have a "conformance chapter", chapter 4 on C18, that states the following on section 1: "In this International Standard, “shall” is to be interpreted as a requirement on an implementation or on a program; conversely, “shall not” is to be interpreted as a prohibition" The technical point is the following, in section 5.1.1.2 in C18 , there is a similar section in C90, phase 2 states the following: "Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place." Consider the following program: simple_case.c ```c int main(){ return 0; } \ ``` clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. The clang version above compiles it just fine, and the gcc version above issues the following warning: "c.c:2:1: warning: backslash-newline at end of file \ " If instead we have: elaborate_case.c " int main(){ return 0; } \ " then they both compile fine without warning. Question: according to the standard "A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place.", are we not required by the standard to treat such a text file, the elaborate_case.c, as not a "source file"? That is, it should not compile.