llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.59k stars 11.82k forks source link

Target x86_64 RTL --> Target library loading error: undefined symbol with global constexpr T[] #45875

Open 766e2662-641f-4db9-99f5-d4a4dc8c4704 opened 4 years ago

766e2662-641f-4db9-99f5-d4a4dc8c4704 commented 4 years ago
Bugzilla Link 46530
Version unspecified
OS Linux
Attachments Preprocessed code, gz
CC @alexey-bataev,@jdoerfert

Extended Description

Problem description:

When a global constexpr array is accessed through member operator/function under some condition which are not yet clear to me, a symbol for the global array is reported missing although no such symbol should generated.

Example

In the attached code the following key point happen:

Expected result

Running the code, the last line on stdout should be: -1.000000 -1.000000 (there will be one of these per thread and some junk before)

Actual result

exit(1) is called with the error message:

Libomptarget fatal error 1: failure of target construct while offloading is mandatory

using libomptarget build with debug enabled and setting LIBOMPTARGET_DEBUG=1 yields the following error message:

Target x86_64 RTL --> Target library loading error: /tmp/tmpfile_SsIv8d: undefined symbol: _ZN27pmacc_static_const_storage031pmacc_static_const_vector_host0L29DriftParamIons_direction_dataE

The missing symbol reads

pmacc_static_const_storage0::pmacc_static_const_vector_host0::DriftParamIons_direction_data

which is the constexpr defined at line 101181 .

Example Code

Attached is the preprocessed code described above, I could not find a small reproducer by only implementing the properties outlined above. Running clang++ with -E appears to have duplicated code, which I guess is for offload? The code in lines 101181ff reappears at 202434ff.

The attached code can be compiled with:

clang++  -fopenmp -fopenmp=libomp -fopenmp-targets=x86_64-pc-linux-gnu    -fopenmp=libomp helloWorld.ii -lomp -o helloWorld

and ran using

./helloWorld

.

Additional Observations

Applying the following diff:

101201c101201
<             , stor[2]
---
>             , -1.
202454c202454
<             , stor[2]
---
>             , -1.

removes the error and produces the expected result. Note, that this only removes the indirect access to DriftParamIons_direction_data but keeps the direct access, so the global constexpr array is not the issue.

jdoerfert commented 3 years ago

Arguably our diagnostic is not great. We should either warn/error or do the right thing.

alexey-bataev commented 3 years ago

Declaring it as declare target works.

Not sure whether this explicitness is mandated by the standard or if this is still a bug.

I don't remember a special requirement about this in the standard. Of course, we can handle it in a different way but currently the standard does not distinguish constexpr/regular global vars, AFAIK

766e2662-641f-4db9-99f5-d4a4dc8c4704 commented 3 years ago

Declaring it as declare target works.

Not sure whether this explicitness is mandated by the standard or if this is still a bug.

alexey-bataev commented 3 years ago

Try to mark this constexpr variable as declare target. Compiler not always optimizes such variables in the frontend and it may try to emit it. But since this variable is not marked as declare target, it won't be emitted for the device and you might end up with the unresolved link.

766e2662-641f-4db9-99f5-d4a4dc8c4704 commented 3 years ago

Where is the stor data defined? It is DriftParamIons_direction_data itself, as used in https://github.com/jkelling/alpaka/blob/clang46530/example/helloWorld/src/helloWorld.cpp#L55 . This is created by the macro invocation in https://github.com/jkelling/alpaka/blob/clang46530/example/helloWorld/src/helloWorld.cpp#L35 .

The way I see this, constexpr ConstArrayStorage<double, 3> stor; only defines a "view" and not an array. The operator[] of that view accesses some array that has to be defined elsewhere (via return PMACC_JOIN(Name,_data)[idx];).

CONST_VECTOR( float_X, 3, DriftParamIons_direction, 0.0, 0.0, -1.0 ); defines the vector for DriftParamIons_direction, which doesn't cause a problem apparently. Are you missing something like CONST_VECTOR( double, 3, stor, 0.0, 0.0, -1.0 ); No. stor is just a local name, but it is correct, that this is just a view. The type of stor is ConstArrayStorage<double, 3>, which is actually pmacc_static_const_storage0::ConstArrayStorage<double, 3>. The the macro invoked in helloWorld.cpp:35 creates this namespace and the view type as well as the constexpr _data array inside it.

Note, that there is a using namespace ... in https://github.com/jkelling/alpaka/blob/clang46530/example/helloWorld/src/include/pmacc/math/ConstVector.hpp#L88 which makes it so that helloWorld.cpp:51 (and helloWorld.cpp:54) even compile. If there were more than one invocation of CONST_VECTOR, the declaration constexpr ConstArrayStorage<double, 3> stor; would be ambiguous.

I pushed a change to give the fully qualified namespaces in https://github.com/jkelling/alpaka/blob/clang46530/example/helloWorld/src/helloWorld.cpp#L51 and https://github.com/jkelling/alpaka/blob/clang46530/example/helloWorld/src/helloWorld.cpp#L54 to hopefully remove the confusion.

Both line 54 and 55 access the same constant. Only, line 55 is going through an inline constexpr operator[] .

I hope this helps.

jdoerfert commented 4 years ago

I don't get this.

Where is the stor data defined? The way I see this, constexpr ConstArrayStorage<double, 3> stor; only defines a "view" and not an array. The operator[] of that view accesses some array that has to be defined elsewhere (via return PMACC_JOIN(Name,_data)[idx];).

CONST_VECTOR( float_X, 3, DriftParamIons_direction, 0.0, 0.0, -1.0 ); defines the vector for DriftParamIons_direction, which doesn't cause a problem apparently. Are you missing something like CONST_VECTOR( double, 3, stor, 0.0, 0.0, -1.0 );

?

766e2662-641f-4db9-99f5-d4a4dc8c4704 commented 4 years ago

The access to the global consexpr which works is just a bit further down, here: https://github.com/jkelling/alpaka/blob/clang46530/example/helloWorld/src/helloWorld.cpp#L54 The next line is the access to the same element which triggers the error: https://github.com/jkelling/alpaka/blob/clang46530/example/helloWorld/src/helloWorld.cpp#L55

766e2662-641f-4db9-99f5-d4a4dc8c4704 commented 4 years ago

The full code is here: https://github.com/jkelling/alpaka/tree/clang46530

(if you clone this note, that you need to checkout the branch clang46530)

You can configure the build using cmake like this:

mkdir build
cd build
cmake ..  -DOpenMP_CXX_VERSION=5 \
        -DALPAKA_ACC_ANY_BT_OMP5_ENABLE=on \
        -DALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLE=off \
        -DALPAKA_ACC_CPU_B_SEQ_T_OMP2_ENABLE=off \
        -DALPAKA_ACC_CPU_B_SEQ_T_FIBERS_ENABLE=off \
        -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLE=on \
        -DALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLE=off \
        -DALPAKA_ACC_CPU_B_TBB_T_SEQ_ENABLE=off \
        -DALPAKA_OFFLOAD_MAX_BLOCK_SIZE=1 \
        -DALPAKA_ACC_GPU_CUDA_ENABLE=off \
        -DALPAKA_ACC_GPU_HIP_ENABLE=off \
        -DBUILD_TESTING=off -Dalpaka_BUILD_EXAMPLES=on \
        -DCMAKE_CXX_FLAGS="-fopenmp -fopenmp=libomp -fopenmp-targets=x86_64-pc-linux-gnu" \
        -DCMAKE_EXE_LINKER_FLAGS="-fopenmp"

On my system, I also have to add a -L[path to by cland install]/lib to -DCMAKE_EXE_LINKER_FLAGS otherwise it will link the wrong libomp which makes the binary crash before main(), I guess the cmake code needs to be fixed to do the OMP5 build correctly.

The target with the problem is "helloWorld". https://github.com/jkelling/alpaka/blob/clang46530/example/helloWorld/src/helloWorld.cpp

The constexpr is "declared" here: https://github.com/jkelling/alpaka/blob/clang46530/example/helloWorld/src/helloWorld.cpp#L35 but this is behind a few layers of macros.

jdoerfert commented 4 years ago

Could you provide the non-preprocessed code too? Even if it is not all of it, the declaration and the illigel use of the constexpr global would be sufficient.