llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.07k stars 11.98k forks source link

Pure attribute and C++ exceptions #42620

Open llvmbot opened 5 years ago

llvmbot commented 5 years ago
Bugzilla Link 43275
Version 8.0
OS All
Reporter LLVM Bugzilla Contributor
CC @DougGregor,@erthink,@riccibruno,@zygoloid,@rnk,@stephenhines

Extended Description

Case:

#include <map>
using namespace std;

struct M
{
    std::map<int, long> m{{1, 2L}};

    [[gnu::pure]] long&
    query(int k)
    {
        return m.at(k);
    }
};

int main()
{
    M m;

    try
    {
        m.query(3);
    }
    catch(std::out_of_range&)
    {}
}

Currently the case may behave differently with g++/clang++, depending on the target and the unwinder used.

Using clang++ with ARM EHABI libunwind (with [1]), it will crash because the generated index table entry (defined by ARM EHABI [2]) for the frame contains EXIDX_CANTUNWIND, so the unwinder returns _URC_FAILURE as the result of _Unwind_RaiseException. Then, std::__terminate is immediately called in __cxa_throw, despite any enclosing try and catch clauses remained.

The pure attribute in the source code implies nounwind in IR, which causes the non-intuitive behavior.

Since this divergence will cause real problems in transition g++ to clang++ (like in Android NDK, and in this case it is actually more troublesome than the migration of the compiler itself [3]), clang++ should better handle the case specifically. I prefer pure does not imply nounwind, and if the old behavior is needed, __attribute__((pure, nothrow)) should be sufficient. Otherwise, if it should be the responsibility of the user to care the divergence, at least a diagnostic message to warn about suspicious cases (where throw and pure attribute are used together) will be helpful.

(After days of work to pull the logic from libcxxabi/libunwind to my code and finally seeing this is not the problem of the runtime, I'd like to say, this is really terrible experience.)

Appendix

The root reason is that, historically, the semantics of the pure attribute is a bit unclear, particularly with respect to exception handling.

In the documentation of some old version of GCC [4], use of the pure attribute only implies that the function should be subject to CSE and loop optimization, but not explicitly prevent the exception propagation (as a precondition). However, it is known that, different to clang++, g++ does not optimize when C++ exception handling is enabled, and such behavior might be useful [5]. Since the documentation for the pure attribute is missing in Clang for years [6], users can easily get confused which behavior is correct (by design).

With the new release of GCC 9, the documentation [7] has been updated to make it clearly "prohibits a function from modifying the state of the program that is observable by means other than inspecting the function's return value". However, this still not eliminate the potential code transition problem between old versions of g++ and clang++. Avoiding the misuse is now the responsibility of users.

But more problems still remain.

The semantics of the pure attribute that both g++ and clang++ follow has once been proposed to C++17 [8]. They share many common problems.

(1) Even if the intent can be somewhat clear, the reasoning from the theory presented in the proposal is flawed. Although traditional mathematical theories have no notion about side effects, not all modern ones are the same. Rewrite systems like lambda calculi are already extended (to be operational semantic models of programming languages) with built-in support for side effects, including states and control effects. And even pure functional programming practice does not reject extra effects expressed implicitly in the programs (for instance, by monadic syntax transformation or algebraic effect system). Thus, the conclusion of equivalence between using of pure functions and side-effect-free property (in section 5.1) is unsound. "The fact" of "function should not have any side effect" mentioned here is only widely accepted by users without experience of (some cutting-edge) effectful programming systems. This does not mean it is true in nature. Such misconception may raise new confusion about the extent of "pure" with different definitions from various theories (not less than the confusion between "pure" and "const" attributes).

(2) The proposal did not make the control effects explicit. It only mentions the requirements of preventing "observable" side effects and informally excludes exception propagation in the note of the proposed wording. The terminology of "observable side effects" is lack of definition. (And worse, there can be more problems on "observable" [9].)

(3) More generally, non-strict evaluations (like clauses of an if statement or short circuit evaluation of some subexpressions) can be restricted forms of control effects (via some continuation-style representations) implemented by branching. So does a return statement. So, why set throw aside?

(4) Not only concerns on debugging mentioned in the proposal, it also clash with features probably but not always used for debugging like backtrace. This is not only about the compiler frontend feature, see [10].

(5) When used as a part of the interface, it does not work well with the mental model of narrow contracts [11] used for C++ library interface design.

As of the GCC documentation [7], it is unclear that what exactly "observable effects" are. Besides, are the states of the implementation details (like those in the unwinder) parts of the states of a C++ program? Moreover, it is still not clear about the effects of violation. Since it does not state clearly that the behavior will be undefined, violation of the precondition can be interpreted as having unspecified but more or less predictable (due to the nature of CSE) results in the program. Users may still find reasons to rely on incompatible (to clang++) behavior. That's bad.

[1] https://reviews.llvm.org/rL238560 [2] http://infocenter.arm.com/help/topic/com.arm.doc.ihi0038b/IHI0038B_ehabi.pdf [3] https://android.googlesource.com/platform/ndk/+/master/docs/BuildSystemMaintainers.md#Unwinding [4] https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/Common-Function-Attributes.html#index-pure-function-attribute [5] https://kristerw.blogspot.com/2016/12/gcc-attributepure-and-c-exceptions.html [6] http://lists.llvm.org/pipermail/cfe-dev/2016-April/048295.html [7] https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Common-Function-Attributes.html#index-pure-function-attribute [8] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0078r0.pdf [9] https://github.com/cplusplus/draft/issues/3215 [10] http://logan.tw/llvm/nounwind.html [11] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3248.pdf

5b41639c-9936-4f42-a42c-86e0b7a70c37 commented 4 years ago

Please do not fix this bug, because it allows a lot people to debug for many unforgettable hours!

This is really cool since the "pure" attribute is not currently documented, but is supported by the compiler with the significant difference from the GNU's original: it cancels the explicit noexcept(false) specification and does not generate any warnings. This is a real find that should not be lost so easily!

However, I think it would be nice to make a small improvement. For example, when compiling, choose randomly the one of three behaviors:

;)

In any case, thank you for CLANG/LLVM. Regards.

llvmbot commented 5 years ago

That was a lot of text and I apologize for not reading all of it, but I'd mention that it's currently not possible to throw a C++ exception without writing memory in TLS, so this all seems somewhat academic: the function is not pure, it has side effects.

AFAIK this is the status quo in most C++ implementations, but not the necessities implied by the requirements of ISO C++. For example, in the cases the frontend can statically prove the exceptions will always be consumed by specific callers in the same translation unit and the code has no explicit calls to the runtime, no actual unwinding to implement EH is needed at all.

But sure, this can't be proven in general, and it virtually never allows interoperation with other languages. So, I don't propose to change it here. (Let the better side-effect-free handling be done with new features like WG21 P0709 [1].) The change proposed here is only specific to the pure attribute.

If we made the proposed change I think it would be legal for the optimizer to hoist loads from the exception object out of the landingpad and across the invoke.

I think so.

I suppose a reasonable fix for that would be to model landingpad as writing unobservable memory. Maybe we already do, I'm not sure.

Probably we can do it without specifying what effects are unobservable.

I'd try rephrasing my preferred change clearer in a different approach, by settling down the following wording for the pure attribute (in C++ modes) in the documentation:

Two calls are same if it has same callee (a function or a function template) and same arguments (glvalues or prvalues). A call to a callee specified with pure attribute may omit the evaluation of its body if the body is already evaluated in a call same to it. The evaluation of the body shall not happens before the evaluation as if it should have been evaluated (in the call) at the first time according to the ISO C++ abstraction machine semantics. This is even true when the call is exited via an exception or has other additional control effects (like unwinding) which may affect observable behavior of the program. The evaluation of the body may be reordered and/or merged with evaluation of the body in the same calls qualified to pure across translation units in spite of such effects. If the callee is non-throwing (specified by nothrow attribute or non-throwing noexcept-specifier) along with the pure attribute, the implementation may assume there is no such effects; if the callee exits with an exception, std::terminate is called immediately. [Note: The overall change is an extension to ISO C++ evaluation semantics, but in cases where the function is really pure in the traditional sense, it is already allowed by ISO C++'s as-if rules (as current). Side effects in argument evaluations are kept unchanged and they can lead to undefined behavior.]

This approach may require less modification (possibly it just works only with the change of frontend). The core idea is to grant implementation the license to ignore duplicate calls on demand, whether they are throwing or not. It is essential a non-deterministic selection device on some specific forms of calls. Such calls are always assumed "pure" by the compiler respected to CSE, rather than in the traditional academic sense. With this choice, there is no need to emit code for the same calls in spite of throwing solely according to the rules specified above. This should not break the code works with current semantics (except relying on the undocumented termination behavior), and anyway, it is users' responsibility to make the decision whether such an attribute is appropriate for the need.

Someone may consider better introduce a new attribute to avoid confusion. But as I stated previously, there is already enough confusion on purity (and the const attribute). Keep it orthogonal to non-throwing (and unwinding) property is clearer IMO.

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0709r4.pdf

rnk commented 5 years ago

That was a lot of text and I apologize for not reading all of it, but I'd mention that it's currently not possible to throw a C++ exception without writing memory in TLS, so this all seems somewhat academic: the function is not pure, it has side effects.

If we made the proposed change I think it would be legal for the optimizer to hoist loads from the exception object out of the landingpad and across the invoke. I suppose a reasonable fix for that would be to model landingpad as writing unobservable memory. Maybe we already do, I'm not sure.

Other users have complained about this in the past. Sanjoy Das made the analogous change to LLVM IR semantics in https://reviews.llvm.org/D28740.

FrankHB commented 2 years ago

There was some people bitten by this bug at GCC and he mentioned Clang's behavior. The reply clearly implied this was not intended. The further change in GCC source made it more explicit.

So, it's time to document the divergence (at least internally), if it will not be changed.

FrankHB commented 1 year ago

Note the incompatibility is known years ago.