Missed simple devirtualization

davidbolvansky commented 1 year ago

struct base_task
{
    void run() {
        return run_impl();
    }
    virtual void run_impl() = 0;
};

struct compute_task final : public base_task
{
    void run_impl() {
        __builtin_printf("work");
    }
};

void wrapper(compute_task &t) {
    t.run();
}

GCC is able to devirtualize it.

https://godbolt.org/z/cx6d565hr

llvmbot commented 1 year ago

@llvm/issue-subscribers-clang-codegen

davidbolvansky commented 1 year ago

cc @rjmccall @AaronBallman @fhahn

davidbolvansky commented 1 year ago

Another simple case.

struct base_task
{
    virtual void run() = 0;
};

struct compute_task : public base_task
{
    void run() {
        __builtin_printf("work");
    }
};

void wrapper(compute_task t) {
    t.run();
}

AaronBallman commented 1 year ago

The second example is devirtualized, at least in terms of the IR clang generates: https://godbolt.org/z/af4MWKrf4

That said, I'm not certain why we're not devirtualizing the initial example. Because the class is local to the TU, there cannot be further derivations, so it seems like we should be able to. But when I step into the debugger, it seems we're not called EmitCXXMemberOrOperatorMemberCallExpr() for the call to run_impl(). Someone more familiar with codegen will likely have a better idea.

davidbolvansky commented 1 year ago

Works with void wrapper(compute_task &t) { t.run_impl(); } (workaround).

rjmccall commented 1 year ago

The class is not internal to the translation unit. There is no formal difference in C++ between classes defined in .h files and classes defined in .cpp files. It is possible for another translation unit to contain exactly these declarations and then add a subclass of compute_task (except of course that it's final in the first example). Fortunately, it doesn't matter in either of these examples, which should be devirtualizable in theory without LTO.

The second example can immediately devirtualized by the frontend because the dynamic type of t is known statically to IRGen.

The first example cannot be immediately devirtualized because the dynamic type of this is not known in base_task::run(), which is where the only virtual call occurs. To devirtualize the first example, we need to do one of the following:

Inline the call to base_task::run() as a frontend-level optimization, turning the call to run_impl into a call to a final method that can be immediately devirtualized.
Emit and call a specialized variant of base_task::run() as a frontend-level optimization, again turning the call to run_impl in the specialized variant into a call to a final method than can be immediately devirtualized.
Emit some kind of assumption at the start of wrapper that we know that the v-table field is the v-table for compute_task, inline the call to base_task::run() as an LLVM optimization, and use generic memory analysis to first fold the load of the v-table field to the known v-table and then fold the load of the virtual function pointer to the known contents of the v-table object.

Clang does not do frontend-level optimizations at the level of sophistication necessary for 1 or 2. I think GCC might do 1. The easiest thing to do given current optimizer structure is 3, but I think that kind of assumption emission has historically been problematic in LLVM.

llvm / llvm-project

Missed simple devirtualization #61950