llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.96k stars 11.94k forks source link

constexpr const char* comparison error #37361

Open llvmbot opened 6 years ago

llvmbot commented 6 years ago
Bugzilla Link 38013
Version 6.0
OS All
Reporter LLVM Bugzilla Contributor
CC @dwblaikie,@DougGregor

Extended Description

The following snippet seems to produce contradictory results when comparing const char* pointers in a constexpr expression:

constexpr auto name1() {
    return "test1";
}

constexpr auto name2() {
    return "test2";
}

int main() {
    constexpr auto b1 = (name1() == name1()); // compiles cleanly
    constexpr auto b2 = (name1() == name2()); // doesn't compile
}

Apparently (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86369), neither should compile. Live code here: https://godbolt.org/g/8W4VrS

JOE1994 commented 5 months ago

Reproducible with clang 18.1.0.

llvmbot commented 5 months ago

@llvm/issue-subscribers-clang-frontend

Author: None (llvmbot)

| | | | --- | --- | | Bugzilla Link | [38013](https://llvm.org/bz38013) | | Version | 6.0 | | OS | All | | Reporter | LLVM Bugzilla Contributor | | CC | @dwblaikie,@DougGregor | ## Extended Description The following snippet seems to produce contradictory results when comparing const char* pointers in a constexpr expression: constexpr auto name1() { return "test1"; } constexpr auto name2() { return "test2"; } int main() { constexpr auto b1 = (name1() == name1()); // compiles cleanly constexpr auto b2 = (name1() == name2()); // doesn't compile } Apparently (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86369), neither should compile. Live code here: https://godbolt.org/g/8W4VrS
cor3ntin commented 5 months ago

@AaronBallman you were the last one to comment here https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86369

After some digging this looks like https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2766 @zygoloid

zygoloid commented 5 months ago

Under the new "string literal objects" wording, I think we have this completely backwards: b1 should not compile, because the two calls to name1 might produce different string literal objects. b2 should compile, because it's not unspecified whether name1() and name2() are distinct pointers -- even though it is unspecified exactly which objects they point to.

Superficially, it seems like we should: 1) track the call frame index and version for each StringLiteral lvalue, like we do for local variables, so we can determine whether two string literal lvalues were created by the same string literal evaluation 2) change the string literal comparison code to only say the result is unspecified if the lvalues refer to the same offset within string literals with the same value

But both of those are subtly wrong :(

For point 1: a call frame index is only meaningful within a single evaluation. We need some way to track this that works across evaluations. For example:

constexpr const char *f() { return "hello"; }
constexpr auto a = f();
constexpr auto b = f();
constexpr auto c = b;
static_assert(b == c); // should pass
static_assert(a == c); // error, not constant

... so we would need some way of tracking string literal object identity across constant evaluations. This will need some thought and design.

For point 2: embedded nul bytes are a problem. For example, "foo\0bar" + 5 == "bar\0baz" + 1 should have an unspecified result, even though the string literals are not the same, because they could both point into the same string literal object. (Per [intro.object]/9-10, string literal objects are potentially non-unique and thus can overlap each other.)

There's also a potential concern here for the backing array of a std::initializer_list: the pointer returned by begin() on a constant initializer list object has the same behavior as the pointer formed by decaying a string literal object. But it's worse in that case because we can't in general even tell whether two initializer lists have the same representation and thus might overlap each other.

So: I think allowing the general case of b2 (eg, with initializer lists) is probably not feasible, and allowing b1 but not the a == c comparison above seems impractical (but probably feasible if we really want to go there). We probably ought to have a Clang representative discuss this with CWG.

AaronBallman commented 5 months ago

Thank you for the detailed explanation @zygoloid! Yeah, this is a mess...

So: I think allowing the general case of b2 (eg, with initializer lists) is probably not feasible, and allowing b1 but not the a == c comparison above seems impractical (but probably feasible if we really want to go there). We probably ought to have a Clang representative discuss this with CWG.

I agree with this -- @hubert-reinterpretcast would you be able to represent Clang in this discussion?

hubert-reinterpretcast commented 5 months ago

This is more https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2765 than 2766.