[C++20] [constexpr] Should we cache the result of constexpr/consteval expression?

ChuanqiXu9 commented 1 year ago

I found this one when I investigate the building time in modules.

The reproducer:

// a.cpp
constexpr bool f() {
    for (unsigned n = 0; n != 1'000'000; ++n) {
    }
    return true;
}

template<typename>
struct s {
    static constexpr auto a = f();
#ifndef NO_MULTIPLE_CALL
    static constexpr auto b = f();
    static constexpr auto c = f();
    static constexpr auto d = f();
#endif
};

template struct s<int>;
template struct s<long>;

Let's run it with:

set x
echo "Build time for duplicated"
time clang++ -std=c++20 a.cpp -c -o a.o
echo "Build time for not duplicated"
time clang++ -std=c++20 a.cpp -c -o a.o -DNO_MULTIPLE_CALL

The result in my machine shows:

Build time for duplicated

real    0m11.753s
user    0m11.611s
sys 0m0.016s
Build time for not duplicated

real    0m3.631s
user    0m3.576s
sys 0m0.013s

So now Clang doesn't cache the result of constexpr/consteval clearly. This is significant when we try to use some metaprogramming techniques. Is there any special reason?

llvmbot commented 1 year ago

@llvm/issue-subscribers-c-20

tbaederr commented 1 year ago

I tried doing exactly that by simply having a Expr* -> APValue cache in ASTContext, but there were some test failures so I didn't proceed further. I think the problem is that you'd need to add a few flags to the hash key, e.g. whether something only being evaluated for overflow etc.

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-frontend

tbaederr commented 1 year ago

Also relevant discussion: https://github.com/llvm/llvm-project/issues/13222

ChuanqiXu9 commented 1 year ago

I feel that discussion didn't talk a lot. Let's try to mark it as duplicate and track in this one. (Since that one is from llvmbot. So it looks a little bit outdated to me.)

royjacobson commented 1 year ago

What happens if a template specialization is added between calls? Or with all the 'unique unevaluated lambda types' tricks? (https://github.com/DaemonSnake/unconstexpr-cpp20 for example)

~C++23 also has static constexpr vars, but I'm not sure how they work.~ it doesn't allow mutable statics

I guess it might be possible to calculate when caching is possible, but I don't think it can trivially work.

There's also an easy workaround here with constexpr variables.

ChuanqiXu9 commented 1 year ago

Sent https://reviews.llvm.org/D146410 to get a more precise feeling for the proposed solution. There are several problems for the patch. But it can solve the problem in the issue and it can pass all the test except CodeGenCXX/builtin_LINE.cpp. So I guess it might be the potential solution.

After applying the above patch, the output for the reproducer would be:

Build time for duplicated

real    0m1.825s
user    0m1.796s
sys 0m0.011s
Build time for not duplicated

real    0m1.829s
user    0m1.801s
sys 0m0.008s

It looks pretty good to me.

What happens if a template specialization is added between calls?

The explicit specialization of functions after instantiation is not allowed. I guess you may want to ask for the following case?

constexpr bool f(short) {
    for (unsigned n = 0; n != 1'000'000; ++n) {
    }
    return true;
}

constexpr auto a = f((int)43);
constexpr auto b = f((int)43);
constexpr auto c = f((int)43);
constexpr auto d = f((int)43);

constexpr bool f(int) {
    for (unsigned n = 0; n != 1'000'000; ++n) {
    }
    return false;
}

constexpr auto e = f((int)43);
constexpr auto g = f((int)43);
constexpr auto h = f((int)43);
constexpr auto i = f((int)43);

It shows that we will take 3.6s to compile this. So it evaluates the two constexpr functions exactly once for each.

Although the possibility should be pretty low, the biggest problem of the solution is that the current hash algorithm is not solid in theory. It implies that it is possible the compiler may treat two different constexpr function calls as the same. In another way, the ODRHash, as the name implies, is used to perform ODR checking in modules for a pretty long time (including clang modules, so the period should be pretty long for real). If we don't like the potential conflicts for real (I think we will choose the method), we can choose the safer hash algorithm like SHA256 or use llvm::FoldingSetNodeID directly. But SHA256 is still possible to get the conflicted result. So we can't be sure completely too.

Then about the llvm::FoldingSetNodeID , it is actually a vector of unsigned and it is used to identity the unique entity. It looks great, isn't it? But the problem is that it is not good for modules. Since my intention is to reduce the the duplicated computation within modules. It would make the size of the BMI (the byproduct of module files) much larger and the size of BMI is already big.

So personally, I prefer to the current hash values (unsigned).

The second problem may be the style. To be honest, the patch above is a little bit ugly. And there are much more places I didn't touch. I want to move the process of evaluating expressions to ASTContext rather than Expr. But this requires a lot of change which I don't like. But I guess we have to do so if we want to make this.

Other problems I can see is the concern for correctness. This may be the most important thing. But it is hard to discuss without the details.

For CodeGenCXX/builtin_LINE.cpp, the problem is:

constexpr int get_line_constexpr(int l = __builtin_LINE()) {
  return l;
}

int global_two = get_line_constexpr();
const int global_three(get_line_constexpr());

The second call to get_line_constexpr() will return the result from the first call. And I don't get a good solution for this.

CC: @AaronBallman @erichkeane @cor3ntin

cor3ntin commented 1 year ago

We'd need to be very careful. constexpr functions are not pure, neither are consteval functions. I think the main observable effect right now is source_locations but reflection, logging, and other things may affect the result of constant evaluation.

I'm not sure that users can themselves get into a situation where constant evaluation would lead to a different result the second time if the first evaluation was successful. so we might be able to have a list of builtins that exclude an evaluation from caching. maybe that would be enough?

erichkeane commented 1 year ago

I think this probably needs an RFC. But as Corentin says, I'm VERY concerned here.

I think a less dangerous solution is the one that @tbaederr is working on, which is the new bitcode based interpreter, which can then be better cached/executed more quickly.

tbaederr commented 1 year ago

FYI the results with the new constant expression interpreter are:

# time gcc -std=c++20 -c ./a.cpp -fconstexpr-loop-limit=1000000
real    0m0.395s
user    0m0.364s
sys     0m0.030s

# time gcc -std=c++20 -c ./a.cpp -fconstexpr-loop-limit=1000000 -DNO_MULTIPLE_CALL
real    0m0.395s
user    0m0.366s
sys     0m0.028s

# time bin/clang++ -std=c++20 -c ./a.cpp
real    0m4.713s
user    0m4.693s
sys     0m0.014s

# time bin/clang++ -std=c++20 -c ./a.cpp -DNO_MULTIPLE_CALL
real    0m1.939s
user    0m1.926s
sys     0m0.010s

# time bin/clang++ -std=c++20 -c ./a.cpp -fexperimental-new-constant-interpreter
real    0m0.991s
user    0m0.979s
sys     0m0.011s

# time bin/clang++ -std=c++20 -c ./a.cpp -fexperimental-new-constant-interpreter -DNO_MULTIPLE_CALL
real    0m0.330s
user    0m0.317s
sys     0m0.013s

cor3ntin commented 1 year ago

@tbaederr do we know what kind of witchcraft GCC is performing? it's very impressive performance numbers

tbaederr commented 1 year ago

You were too fast, I just updated them. My clang numbers were with msan enabled, sorry.

ChuanqiXu9 commented 1 year ago

@tbaederr the numbers looks pretty good! Is there is RFC or a design doc that I can learn the higher level design? And if the -fexperimental-new-constant-interpreter is going to be enabled by default, I think I don't need to work on.

tbaederr commented 1 year ago

In terms of high-level documentation, there is https://clang.llvm.org/docs/ConstantInterpreter.html, but large parts of that are out of date. I haven't found the time to update it yet.

neko-para commented 7 months ago

I believe this would be quite useful if compiler could promise to do so.

constexpr std::string func() { ... }
constexpr size_t N = func().size();
template <size_t M>
constexpr std::array<char, M> make_data() {
    std::string str = func();
    std::array<char, M> result(str.begin(), str.end());
    result[M - 1] = 0;
    return result;
}
constexpr auto data = make_data<N + 1>();

As shown above, I have to calculate func twice, so that I can store std::string into std::array which can cross the boundary between compiling and runtime.

llvm / llvm-project

[C++20] [constexpr] Should we cache the result of constexpr/consteval expression? #61425