Open FrankHB opened 1 year ago
arises in clang and can lead developers to trigger a security vulnerability. Hence, we are reopening the issue.
@gbdngb12 is there a CVE related to this already? Or just the hypothetical scenarios?
It looks insane to me to fall through to the next function.
inmaldrerah said it well: you can assume it will terminate and get rid of the useless loop, or you can assume it won't terminate and mark it unreachable, but you cannot do both.
This is how c/c++ UBs work. Not "do the (architecturally) most obvious thing" but "do the biggest BS possible". Same case as the returning functions with missing explicit return statement. Those at least have a warning.
This is simply NOT a case of UB in C++. There is simply no normative wording can render it an instance of UB in ISO C++.
In ISO C it is debatable whether it is UB (with Clause 4), but the original wording from ISO C is already vague enough to allow an interpretation of the wording implying UB is not intentional in this case (whether it is the intention of the author of the wording).
(please ease up on the language ("BS", "insane", etc) here)
@AaronBallman @erichkeane is this bug going anywhere constructive? Or can/should we close it with your/our current understanding, and revisit the issue when there's changes to the spec?
I'm against to closing before futher change is made either on the rules or the implementation, because:
Aaron should answer too, but IMO, what we are doing is permitted per the standard (albeit perhaps user-hostile), thus the language standards organizations are the correct place to hash this out.
I agree that I think this is permitted per spec, and I agree that it would be good to involve standards committees in fixing that. However, I think it's perfectly reasonable for us to improve this under QoI -- just because we can optimize some way doesn't mean we have to, and the union of these two optimization decisions has incredibly surprising behavior.
You fail to convince me, the owner of this issue, to accept the point of reading the standard rules exactly in this way. It is certainly good to improve the wording to avoid such possibility of divergence, but to my reading, the current status is technically NOT A DEFECT in the standard (of any published editions), whether the author of the original wording (from ISO C) agrees. (It would be a defect in the standard procedure of CWG if the current wording is formally identified as unintented, but given that the current adopted wording in ISO C++ is internally consistent, it is doubtful how this can be retroactively applied: it should likely be a plain breaking change, and the conformance bugs will still exist with modes implying previous version of standard.) So there must be something wrong specifically to this project. You have to take actions in related issues to clarify the existence of possible non-conformance before the needed changes (either in the implementation or the standard) are settled down.
I'm open to different reading of the standard rules in the upstream issues, but not here, without sufficient proofs logically consistent to the accepted practice already built by the community of the language (not only this project).
I think it's still an issue that occurs naturally in the wild, just with really hard-to-debug symptoms. (I have nothing to back this thinking up beyond "I've seen some stuff in my years".)
(1) generating a sound warning (like a returning function with missing return statement) would be enough IMO. Especially that the first thing during debug is to check compiler warnings and production codebases are protected with -Werror on CIs.
It won't help much, unless the risks of the non-conformance is warned sufficiently. The net effects of the warning would then be "do not write code like this if you want to work with LLVM-based toolchains".
This is somewhat like the case of -fmerge-constants
or -ffast-math
which should not be enabled by default. However, in this issue, is there any option to disable the unconforming behavior? Why not simply disable the offending transformation, given that the effect can be achived in clearer and more definite ways (e.g. unreachable attributes)?
A signal handler may modify a
volatile
object or terminate the program by callingstd::_Exit()
. As a consequence, the implementation must ensure that, a handler, which has been installed in a compliant way, shall be invoked eventually upon delivery of a corresponding signal.
I've carefully checked the C standard and I'm confident that - a conforming implementation doesn't need to make a program able to handle any external signal.
I'd say it's conforming to make signals irrelavent to the examples posted here.
Violation of a permission does not imply undefined behavior, at least with current rules of ISO C++. Let's discuss the issue of the standard there.
BTW, there're are two occurences of plain "may assume" without explicit UB in the current standard wording (or since C++11), all of which are proven buggy or bug-adjacent - the other is LWG3511 (in [res.on.arguments]/1.3). I believe we should kill this pattern.
@frederick-vs-ja
5.2.3 Signals and interrupts
1 Functions shall be implemented such that they may be interrupted at any time by a signal, or may be called by a signal handler, or both, with no alteration to earlier, but still active, invocations’ control flow (after the interruption), function return values, or objects with automatic storage duration. All such objects shall be maintained outside the function image (the instructions that compose the executable representation of a function) on a per-invocation basis.
@lhmouse How can you ensure that any external signal exists? It's conforming to make only internal signals counted.
@lhmouse How can you ensure that any external signal exists? It's conforming to make only internal signals counted.
I have been tired of this debate. I do not have to come up with 1,000 testcases where the compiler have done right, as if I was attempting to show the compiler was right. I am showing the compiler is wrong, and one testcase where it did wrong is enough.
@lhmouse How can you ensure that any external signal exists? It's conforming to make only internal signals counted.
I have been tired of this debate. I do not have to come up with 1,000 testcases where the compiler have done right, as if I was attempting to show the compiler was right. I am showing the compiler is wrong, and one testcase where it did wrong is enough.
I don't think such testcase is even related or can prove whether compiler is wrong. I'm also tired of this, IMO unrelated, topic.
I don't think such testcase is even related or can prove whether compiler is wrong. I'm also tired of this, IMO unrelated, topic.
So, suppose you are building a compiler. You want people to appreciate your compiler, so you want to make a good compiler that actually compilers their code as they may expect. Then you should ask yourself why someone would write while(true) ;
. They do that for a reason, which you should take into account. And they do not expect your compiler to produce undefined behavior for them. You will be blamed for that.
Violation of a permission does not imply undefined behavior, at least with current rules of ISO C++. Let's discuss the issue of the standard there.
BTW, there're are two occurences of plain "may assume" without explicit UB in the current standard wording (or since C++11), all of which are proven buggy or bug-adjacent - the other is LWG3511 (in [res.on.arguments]/1.3). I believe we should kill this pattern.
A bit off-topic, but I don't see the problem of the wording itself (which is perfectly clear; the LWG issue is another story).
The problem is the original intents, in both cases. I've expressed the concerns some times for the issue here. OTOH, the [res.on.arguments] case is to allow optimization as restrict
allows in ISO C. I don't think C's wording of the formal definition of restrict
better here. Although it is clear about being having a UB, it also has "the translator is free to ..." (something sounds like "may assume" in a more informal tone) as a separate normative rule. It should not be the same to the UB rules; or it is totally redundant. The wording is also poor that it did not sufficiently address issues DMR had complained a decade ago. Even if such lengthy normative wording is expected, UB here is still suspicuous, because the optimizer should work without the permission from UB; unspecified behavior is enough for the assumptions. A "may assume no alias" will be better compared to UB+"is free to..." for the quality of the wording, IMO.
It is not necessary to introduce UB to merely enable the possibility of such optimizations. UB is necessary for portability when the behavior is not possible to clearly and reasonbly specified upon the abstract machine semantics, say, crashing, since there is no way to precisely enumerate each instance of the behaviors in terms of the configuraions of the result abstract machine besides "totally unexpected". Both "may assume" cases here are not necessary under such criteria. And in general, assuming "may assume" somewhat synonymous as "unspecified", they are certainly not UB.
Whether they should be reworded as "unspecified" is another topic. moreover, "may assume" can be propably better for the intent, so there can even be some possibility to change "unspecified" to "may assume" if optimization is involved in the original intent. For example, [expr.new] "is allowed to" can be "may". Notice this is actually the preferred usage for this intent as per ISO directive, part 2.
Even UB falls in the category of totally unexpected behaviors can have doubtful value. Missing the newline at the end of TU is one instance. It allows the preprocessor to behave in arbitrary ways if the TU is not ended with a newline. Guess why C++11 removed this feature.
arises in clang and can lead developers to trigger a security vulnerability. Hence, we are reopening the issue.
@gbdngb12 is there a CVE related to this already? Or just the hypothetical scenarios?
It looks insane to me to fall through to the next function.
inmaldrerah said it well: you can assume it will terminate and get rid of the useless loop, or you can assume it won't terminate and mark it unreachable, but you cannot do both.
This is how c/c++ UBs work. Not "do the (architecturally) most obvious thing" but "do the biggest BS possible". Same case as the returning functions with missing explicit return statement. Those at least have a warning.
This is simply NOT a case of UB in C++. There is simply no normative wording can render it an instance of UB in ISO C++.
I currently only have access to draft of the standard, so I don't know what the status is in ISO C++. But here's the parts I read and you can tell me whether it's the same in ISO C++:
Although this document states only requirements on C++ implementations, those requirements are often easier to understand if they are phrased as requirements on programs, parts of programs, or execution of programs. Such requirements have the following meaning: (2.1) If a program contains no violations of the rules in [lex] through [thread] and [depr], a conforming implementation shall, within its resource limits as described in [implimits], accept and correctly execute3 that program. (2.2) If a program contains a violation of a rule for which no diagnostic is required, this document places no requirement on implementations with respect to that program. (2.3) Otherwise, if a program contains a violation of any diagnosable rule or an occurrence of a construct described in this document as “conditionally-supported” when the implementation does not support that construct, a conforming implementation shall issue at least one diagnostic message.
I read this as follows: requirements on programs are actually defining the limits of the scope of C++. If a program fails any "undiagnosed" rules, it falls outside of the scope of the standard and a standard-compliant C++ implementation can do whatever it likes to the program. However, there are some rules called "diagnosable" rules, for which the implementation can do almost whatever it likes to the program, the only restriction is that it at least has to print a "diagnostic message" (!).
So we need to find out whether "no side-effect free infinite loops" (by side-effects I mean specifically the items mentioned in this rule) is a requirement of this kind, and whether it is "diagnosable" or not.
In my reading, "may assume" means that the implementation may pose this as a requirement placed on the program. Thus it is such a rule, which allows the compiler to produce arbitrary code. Including, as the note puts it, removing non-terminating empty loops.
However, "may assume" is a pretty vague formulation and needs to be clarified in the normative text, but I guess that's outside the scope of this thread.
My understanding here is that the standard text does define it as undefined behavior, but undefined behavior is quite a bit more permissive than what is aimed for. It's just that UB is the only vocabulary that the standard currently has that would allow the actual aim, which is:
Maybe a better way would be to give more capabilities to the abstract machine, namely to relax the condition that program order needs to be prefix finite - it only needs to be prefix finite when restricted to side-effects. This would allow the abstract machine to execute code behind an infinite loop/recursion that doesn't have side-effects. The question is how to elegantly specify the point where the code resumes after executing the infinite loop/recursion, especially for goto loops.
And perhaps the easiest solution for an informal standard is to just nebulously describe that the abstract machine can "fast forward" through side-effect free code, including jumping straight to code behind a loop that would not otherwise terminate.
But I digress.
Assuming that it is a rule, the other question is whether it's a "diagnosable" rule. We find the definition of "diagnosable":
The set of diagnosable rules consists of all syntactic and semantic rules in this document except for those rules containing an explicit notation that “no diagnostic is required” or which are described as resulting in “undefined behavior”.
I don't find anything saying that no diagnostic is required, and it's not explicitly described as resulting in "undefined behavior". Therefore, it seems to fall under a "diagnosable" rule. Obviously we know that there is no way to diagnose in general whether a (side-effect-free) loop is infinite or not, and therefore it is not diagnosable in general in a mathematical sense.
I would suggest that until the standard is clarified:
Not every "ill-formed" program needs to be intentionally miscompiled just to keep up with the (normative wording of) the standard.
In my reading, "may assume" means that the implementation may pose this as a requirement placed on the program. Thus it is such a rule, which allows the compiler to produce arbitrary code. Including, as the note puts it, removing non-terminating empty loops.
However, "may assume" is a pretty vague formulation and needs to be clarified in the normative text, but I guess that's outside the scope of this thread.
'may assume' does not grant the compiler the right to delete the (implicit) return
at the end of main()
.
In my reading, "may assume" means that the implementation may pose this as a requirement placed on the program. Thus it is such a rule, which allows the compiler to produce arbitrary code. Including, as the note puts it, removing non-terminating empty loops.
False. As I've mentioned several times, the word "may" in normative text imposes no requirements. If the intention is to express a requirement, the word "shall" shall be used instead. This is quite clear without any doube as per the ISO rules (not only for C++). So basically your logic above does not apply at all.
However, "may assume" is a pretty vague formulation and needs to be clarified in the normative text, but I guess that's outside the scope of this thread.
False. On the contrast, this is an expression clearer than most other wording, as it only simply and directly depends on the upstream ISO rules in an unambiguous way, without any additional definitions (like "undefined behavior" and "no diagnostics required") needed in ISO C++.
My understanding here is that the standard text does define it as undefined behavior, but undefined behavior is quite a bit more permissive than what is aimed for. It's just that UB is the only vocabulary that the standard currently has that would allow the actual aim, which is:
So, the understanding to render the current wording "vague" is from your perspective of the original intention of the drafted rules. This seems plausible only when you insist specific proposals from WG14 (which are debatable in C++), but it is nothing to do with the defect of current published standard, which is even more qualified (in the sense of internal consistency) than any other proposed alternaitves I've found here.
Maybe a better way would be to give more capabilities to the abstract machine, namely to relax the condition that program order needs to be prefix finite - it only needs to be prefix finite when restricted to side-effects. This would allow the abstract machine to execute code behind an infinite loop/recursion that doesn't have side-effects. The question is how to elegantly specify the point where the code resumes after executing the infinite loop/recursion, especially for goto loops.
This is out the scope of this issue. It belongs the discussions to the standard defects.
I won't expand it much. Just restate my opinion here: treating non-termination a kind of side effects is a mature technique in theory. I don't see a fundamentally new different way necessary here.
Locking this issue, as I don't think it needs another person to chime in with their personal reading of the standard -- or to repeat what they have already said numerous times in previous comments.
Case
Reproduction
This will show "Hello world!".
Tested on Godbolt. This issue is reproducible with x86-64 Clang++ since 13.0.0.
Discussions
It might be well-known that ISO C++ permit an implementation to remove infinite loop without any further reasoning and proof of the termination under the abstraction machine semantics. However, there is no wording about assuming it having undefined behavior, so the transformation is limited. The empty loop can be removed, but it should not introduce any differences on observable behavior besides the removal of the loop and such difference (if any; not here because there is no code after the loop) should be considered unspecifed but not undefined. In this case, it is valid to transform the translation unit as if it consists of
int main(){}
, but not with the call tounreachable
.Related
A similar issue #60419 was reported. Besides not that "minimal" in its case, the root cause seems the same. However, there are some other subtleties.
The reply for that issue is technically incorrect. First, it refers to a proposal of WG14, not the standard or any of its draft; second, it is about C, but C and C++ are still different here, even the original paper (WG14 N1528) do have concerns on liaison to WG21.
C and C++ are different in the meta level here. In ISO C, if something the language rules should have specified is actually underspecified, it is undefined, as per Clause 4 (same in many editions):
As WG14 N1528 has clarified, the permission of removal the empty loops is granted, any further side effects of the unexpected change of the transfomation should not be depend on. Otherwise, it is undefined by "omission of any explicit definition or behavior" in mind. Despite the belief in the recommendation section of WG14 N1528, it is still confusing and potentionlly debatable, especially when compared to ISO C++ rules. Note the transformation on the original case in WG14 N1509 as well as this case is permitted, but it does not have to rely any undefined behavior in C, because the remained program does not contribute to any observable behaviors.
There is no such equivalent rules in ISO C++. Moreover, it should be clear the meaning of "may" in the C++ rule is clear as per Table 5 in ISO/IEC directives, part 2. That is, there is nothing to do with voiding the other requirements on a conforming implementation. As a result, the compiler cannot assume anything as it is undefined. Hence, it is a bug of Clang++.
60419 should not be marked as "invalid". And it can be marked "duplicate" in the sense that the case in that issue is also well-formed in C++.
Currently G++ does not have the same issue. It simply does not do the transformation. This is about missing-of-optimization, but not conformance.