llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.92k stars 11.53k forks source link

[SPECCPU2017] omnetpp producing VE with ` -fvirtual-function-elimination` #61294

Open SouraVX opened 1 year ago

SouraVX commented 1 year ago

Benchmark passing/running fine without -fvirtual-function-elimination flag

More Details: From Spec log COMP: "C:/Users/Administrator/sourabh_workspace/build/bin/clang++ -m64 -Wno-error=implicit-function-declaration -c -o options.obj -DSPEC -DNDEBUG -DSPEC_WINDOWS -Isimulator/platdep -Isimulator -Imodel -DWITH_NETBUILDER -DSPEC_AUTO_SUPPRESS_OPENMP -O3 -march=znver3 -flto -fvirtual-function-elimination -fvisibility=hidden " C: CXX="C:/Users/Administrator/sourabh_workspace/build/bin/clang++ -m64 -Wno-error=implicit-function-declaration" C: CXXOBJOPT="-c -o options" P: CPUFLAGS="-DSPEC -DNDEBUG -DSPEC_WINDOWS" P: BENCH_FLAGS="-Isimulator/platdep -Isimulator -Imodel -DWITH_NETBUILDER -DSPEC_AUTO_SUPPRESS_OPENMP" O: CXXOPTIMIZE="-O3 -march=znver3 -flto" O: EXTRA_CXXFLAGS="-fvirtual-function-elimination -fvisibility=hidden" LINK: "C:/Users/Administrator/sourabh_workspace/build/bin/clang++ -m64 -Wno-error=implicit-function-declaration -flto -O3 -march=znver3 -flto -fuse-ld=lld-link -LIBPATH:"C:/Program Files (x86)/Windows Kits/10/Lib/10.0.19041.0/ucrt/x86" -LIBPATH:"C:/Program Files (x86)/Windows Kits/10/Lib/10.0.19041.0/um/x86" -o options.exe " C: LD="C:/Users/Administrator/sourabh_workspace/build/bin/clang++ -m64 -Wno-error=implicit-function-declaration" O: EXTRA_LDFLAGS="-flto" O: CXXOPTIMIZE="-O3 -march=znver3 -flto" C: LDOUT="-fuse-ld=lld-link -LIBPATH:"C:/Program Files (x86)/Windows Kits/10/Lib/10.0.19041.0/ucrt/x86" -LIBPATH:"C:/Program Files (x86)/Windows Kits/10/Lib/10.0.19041.0/um/x86" -o options.exe"

Error 520.omnetpp_r base refrate ratio=19081.03, runtime=0.068759, copies=1, threads=1, power=0.00W, temp=0.00 degC, humidity=0.00%, errorcode=VE Error: 1x520.omnetpp_r

SouraVX commented 1 year ago

Compiler Version info:

"    (From config) $rawcompiler_version = " CXXC_VERSION_OPTION:clang version 17.0.0
(https://github.com/llvm/llvm-project.git 89a515d7b1e87189cffee02b7edbf5f4fd989fd9)
Target: x86_64-pc-windows-msvcThread model: posix
InstalledDir: C:\Users\Administrator\sourabh_workspace\build\trunk-llvm-project\release\bin
AaronBallman commented 1 year ago

We need a reproducer for the issue (community Clang doesn't have access to the SPEC benchmark). It's possible that the issue is that use of -fvirtual-function-elimination is invalid for that given test, and so the runtime failure might be expected. Having a non-SPEC reproducer would help.

wangbyby commented 1 year ago

maybe wholeprogramdevirt pass causes this.

wangbyby commented 1 year ago

I have a min case

run commands: using llvm15.0.4

clang++ -c test.cpp  -o test.o -std=c++03  -flto=full -O3 -mcpu=native -fvirtual-function-elimination -fvisibility=hidden
clang++ -std=c++03   -flto=full -fuse-ld=lld -O3 -mcpu=native test.o
#include <string>
#include <stdio.h>
#include <iostream>

// run commands:
// 
// clang++ -c test.cpp  -o test.o -std=c++03  -flto=full -O3 -mcpu=native -fvirtual-function-elimination -fvisibility=hidden
// clang++ -std=c++03   -flto=full -fuse-ld=lld -O3 -mcpu=native test.o
// # then run the bin file...

class KeyValue
{
public:
    virtual const char *getKey() = 0;
    virtual const char *getValue() = 0;
};

class KeyValue1 : public KeyValue
{
public:
    std::string key;
    std::string value;
    const char *getKey() 
    {
        return key.c_str();
    }
    const char *getValue() 
    {
        return value.c_str();
    }
};

class KeyValue2 : public KeyValue1
{
public:
    std::string useless;
};

class NullEntry : public KeyValue
{
public:
    const char *getKey() 
    {
        return NULL;
    }
    const char *getValue() 
    {
        return NULL;
    }
};

const KeyValue& __attribute__ ((noinline))  getKeyValue(KeyValue2 &k2, NullEntry &nullentry){
    if(k2.key == "key")
        return k2;
    return nullentry;

}

const char* __attribute__ ((noinline)) getValue(KeyValue2 &k2, NullEntry &nullentry  ){
    KeyValue2 & res = (KeyValue2&) getKeyValue(k2, nullentry);
    if(res.getKey() == NULL) // this line would cause error!
        return NULL;
    return res.getValue();
}

int main(){
    KeyValue2 entry2;
    std::string tmp;
    std::cin>>tmp;

    entry2.key = tmp;
    entry2.value = "value";

    NullEntry n;
    const char *s = getValue(entry2, n);
    printf("res = %s\n", s);

    return 0;
}

Two place to cause the Segmentfault.

  1. wholeprogramdevirt would devirtual getKey which is not correct. using -Wl,-mllvm,-wholeprogramdevirt-skip=*getKey* to skip this.
  2. globaldce would delete vtable elements of type NullEntry(global variable of IR) I don't know how to fix it...

The llvm version I use is 15.0.4

wangbyby commented 1 year ago

I don't know SPECCPU2017 using which version of omnetpp. But high version omnetpp not contains this error. If cpp source code contains type conversion between subclasses, then devirtual(remove virtual call and delete vtable element) seems dangerous. Maybe this case is UB? i am not sure

SouraVX commented 1 year ago

I'm observing this on cpu2017-1.1.8 version

wangbyby commented 1 year ago

I'm observing this on cpu2017-1.1.8 version

Maybe the src code in cpu2017 contains UB.

wangbyby commented 1 year ago

We need a reproducer for the issue (community Clang doesn't have access to the SPEC benchmark). It's possible that the issue is that use of -fvirtual-function-elimination is invalid for that given test, and so the runtime failure might be expected. Having a non-SPEC reproducer would help.

Can we remove the incomplete label?

AaronBallman commented 1 year ago

There is undefined behavior here:

const KeyValue& __attribute__ ((noinline))  getKeyValue(KeyValue2 &k2, NullEntry &nullentry){
    if(k2.key == "key")
        return k2;
    return nullentry; // #1

}

const char* __attribute__ ((noinline)) getValue(KeyValue2 &k2, NullEntry &nullentry  ){
    KeyValue2 & res = (KeyValue2&) getKeyValue(k2, nullentry); // #2
    if(res.getKey() == NULL) // this line would cause error!
        return NULL;
    return res.getValue();
}

If k2.key is something other than "key", then we return nullentry at #1, but cast that to KeyValue2& at #2. NullEntry does not derive from KeyValue2, so that cast is UB.

Does the issue go away for you if you change #2 to be KeyValue & res = (KeyValue&) getKeyValue(k2, nullentry); instead?

wangbyby commented 1 year ago

@AaronBallman

Does the issue go away for you if you change #2 to be KeyValue & res = (KeyValue&) getKeyValue(k2, nullentry); instead?

Yes. So, I think maybe the runtime error is caused by not good enough src code in cpu2017. And no need to change llvm src code?

SouraVX commented 1 year ago

Yes. So, I think maybe the runtime error is caused by not good enough src code in cpu2017.

But this error does not trigger on Linux. If it's a source level bug, it should trigger on Linux as well ?

wangbyby commented 1 year ago

But this error does not trigger on Linux. If it's a source level bug, it should trigger on Linux as well ?

I run cpu2017 on openEuler 22.03 LTS has this fail.

AaronBallman commented 1 year ago

@AaronBallman

Does the issue go away for you if you change #2 to be KeyValue & res = (KeyValue&) getKeyValue(k2, nullentry); instead?

Yes. So, I think maybe the runtime error is caused by not good enough src code in cpu2017. And no need to change llvm src code?

Correct, I think this is not a bug in the compiler.

But this error does not trigger on Linux. If it's a source level bug, it should trigger on Linux as well ?

UB can have different behaviors on different targets (or even different runs of the process on the same target).