intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.23k stars 736 forks source link

Unable to build with CUDA backend for NVIDIA Jetson #3325

Open GeekOffTheStreet opened 3 years ago

GeekOffTheStreet commented 3 years ago

Describe the bug I am attempting to build DPC++ with CUDA backend for a Jetson TX2 that is running Ubuntu 18.04. The CPUs are AARCH64, but I'm only interested in the CUDA backend.

Eventually, the build will fail with: error: unable to create target: 'No available targets are compatible with triple "aarch64-unknown-linux-gnu"'

To Reproduce Please describe the steps to reproduce the behavior:

Run the following on a Jetson TX2:

python $DPCPP_HOME/llvm/buildbot/configure.py --cuda --no-werror
python $DPCPP_HOME/llvm/buildbot/compile.py
  1. Include code snippet as short as possible: N/A
  2. Specify the command which should be used to compile the program: See above
  3. Specify the comment which should be used to launch the program: N/A
  4. Indicate what is wrong and what was expected

The build eventually fails with:

$ python $DPCPP_HOME/llvm/buildbot/compile.py
args:Namespace(base_branch=None, branch=None, build_number=None, build_parallelism=None, builder_dir=None, obj_dir=None, pr_number=None, src_dir=None)
[Cmake Command]: cmake --build /home/user/sycl_workspace/llvm/build -- deploy-sycl-toolchain deploy-opencl-aot -j 4
[1/649] Generating ../../lib/libsycl-crt.o
FAILED: lib/libsycl-crt.o
cd /home/user/sycl_workspace/llvm/build/tools/libdevice && /home/user/sycl_workspace/llvm/build/bin/clang-13 -fsycl -c -Wno-sycl-strict -Wno-undefined-internal -sycl-std=2017 -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice,spir64_gen-unknown-unknown-sycldevice,spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice /home/user/sycl_workspace/llvm/libdevice/crt_wrapper.cpp -o /home/user/sycl_workspace/llvm/build/lib/libsycl-crt.o
error: unable to create target: 'No available targets are compatible with triple "aarch64-unknown-linux-gnu"'
1 error generated.
[2/649] Generating ../../lib/libsycl-complex.o
FAILED: lib/libsycl-complex.o
cd /home/user/sycl_workspace/llvm/build/tools/libdevice && /home/user/sycl_workspace/llvm/build/bin/clang-13 -fsycl -c -Wno-sycl-strict -Wno-undefined-internal -sycl-std=2017 -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice,spir64_gen-unknown-unknown-sycldevice,spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice /home/user/sycl_workspace/llvm/libdevice/complex_wrapper.cpp -o /home/user/sycl_workspace/llvm/build/lib/libsycl-complex.o
error: unable to create target: 'No available targets are compatible with triple "aarch64-unknown-linux-gnu"'
1 error generated.
[3/649] Generating ../../lib/libsycl-complex-fp64.o
FAILED: lib/libsycl-complex-fp64.o
cd /home/user/sycl_workspace/llvm/build/tools/libdevice && /home/user/sycl_workspace/llvm/build/bin/clang-13 -fsycl -c -Wno-sycl-strict -Wno-undefined-internal -sycl-std=2017 -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice,spir64_gen-unknown-unknown-sycldevice,spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice /home/user/sycl_workspace/llvm/libdevice/complex_wrapper_fp64.cpp -o /home/user/sycl_workspace/llvm/build/lib/libsycl-complex-fp64.o
error: unable to create target: 'No available targets are compatible with triple "aarch64-unknown-linux-gnu"'
1 error generated.
[4/649] Generating ../../lib/libsycl-cmath.o
FAILED: lib/libsycl-cmath.o
cd /home/user/sycl_workspace/llvm/build/tools/libdevice && /home/user/sycl_workspace/llvm/build/bin/clang-13 -fsycl -c -Wno-sycl-strict -Wno-undefined-internal -sycl-std=2017 -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice,spir64_gen-unknown-unknown-sycldevice,spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice /home/user/sycl_workspace/llvm/libdevice/cmath_wrapper.cpp -o /home/user/sycl_workspace/llvm/build/lib/libsycl-cmath.o
error: unable to create target: 'No available targets are compatible with triple "aarch64-unknown-linux-gnu"'
1 error generated.

Environment (please complete the following information):

Additional context

HipSYCL does build on this target using CUDA, but is missing functionality.

alexbatashev commented 3 years ago

@GeekOffTheStreet if you aren't going to use this extension, I can suggest a quick workaround: 1) Remove this line: https://github.com/intel/llvm/blob/297166fb63eb45ea898fe654ef99414a59a4b6b7/sycl/CMakeLists.txt#L360 2) Add export SYCL_DEVICELIB_NO_FALLBACK=1 to your ~/.bashrc.

GeekOffTheStreet commented 3 years ago

Thanks for the suggestion. I gave it a try, but am still receiving similar compile errors:

~/sycl_workspace/llvm$ git diff
diff --git a/sycl/CMakeLists.txt b/sycl/CMakeLists.txt
index 53bec90a35bb..b570c2ed5a70 100644
--- a/sycl/CMakeLists.txt
+++ b/sycl/CMakeLists.txt
@@ -357,7 +357,6 @@ set( SYCL_TOOLCHAIN_DEPLOY_COMPONENTS
      sycl
      pi_opencl
      pi_level_zero
-     libsycldevice
      ${XPTIFW_LIBS}
 )
~/sycl_workspace/llvm$ env | grep SYCL_DEVICELIB_NO_FALLBACK
SYCL_DEVICELIB_NO_FALLBACK=1

After reconfigure/build:

~/sycl_workspace$ python $DPCPP_HOME/llvm/buildbot/compile.py
args:Namespace(base_branch=None, branch=None, build_number=None, build_parallelism=None, builder_dir=None, obj_dir=None, pr_number=None, src_dir=None)
[Cmake Command]: cmake --build /home/user/sycl_workspace/llvm/build -- deploy-sycl-toolchain deploy-opencl-aot -j 4
[2/645] Generating ../../lib/libsycl-crt.o
FAILED: lib/libsycl-crt.o
cd /home/user/sycl_workspace/llvm/build/tools/libdevice && /home/user/sycl_workspace/llvm/build/bin/clang-13 -fsycl -c -Wno-sycl-strict -Wno-undefined-internal -sycl-std=2017 -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice,spir64_gen-unknown-unknown-sycldevice,spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice /home/user/sycl_workspace/llvm/libdevice/crt_wrapper.cpp -o /home/user/sycl_workspace/llvm/build/lib/libsycl-crt.o
error: unable to create target: 'No available targets are compatible with triple "aarch64-unknown-linux-gnu"'
1 error generated.
[3/645] Generating ../../lib/libsycl-complex-fp64.o
FAILED: lib/libsycl-complex-fp64.o
cd /home/user/sycl_workspace/llvm/build/tools/libdevice && /home/user/sycl_workspace/llvm/build/bin/clang-13 -fsycl -c -Wno-sycl-strict -Wno-undefined-internal -sycl-std=2017 -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice,spir64_gen-unknown-unknown-sycldevice,spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice /home/user/sycl_workspace/llvm/libdevice/complex_wrapper_fp64.cpp -o /home/user/sycl_workspace/llvm/build/lib/libsycl-complex-fp64.o
error: unable to create target: 'No available targets are compatible with triple "aarch64-unknown-linux-gnu"'
1 error generated.
[4/645] Generating ../../lib/libsycl-cmath-fp64.o
FAILED: lib/libsycl-cmath-fp64.o
cd /home/user/sycl_workspace/llvm/build/tools/libdevice && /home/user/sycl_workspace/llvm/build/bin/clang-13 -fsycl -c -Wno-sycl-strict -Wno-undefined-internal -sycl-std=2017 -fsycl-targets=spir64_x86_64-unknown-unknown-sycldevice,spir64_gen-unknown-unknown-sycldevice,spir64_fpga-unknown-unknown-sycldevice,spir64-unknown-unknown-sycldevice /home/user/sycl_workspace/llvm/libdevice/cmath_wrapper_fp64.cpp -o /home/user/sycl_workspace/llvm/build/lib/libsycl-cmath-fp64.o
error: unable to create target: 'No available targets are compatible with triple "aarch64-unknown-linux-gnu"'
1 error generated.
[5/645] Generating sycldevice-binding-nvptx64--/sycldevice-binding.bc
clang-13: warning: Unknown CUDA version. version.txt: 10.2.89. Assuming the latest supported version 10.1 [-Wunknown-cuda-version]
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/home/user/sycl_workspace/llvm/buildbot/compile.py", line 67, in <module>
    ret = main()
  File "/home/user/sycl_workspace/llvm/buildbot/compile.py", line 63, in main
    return do_compile(args)
  File "/home/user/sycl_workspace/llvm/buildbot/compile.py", line 40, in do_compile
    subprocess.check_call(cmake_cmd, cwd=abs_obj_dir)
  File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '/home/user/sycl_workspace/llvm/build', '--', 'deploy-sycl-toolchain', 'deploy-opencl-aot', '-j', '4']' returned non-zero exit status 1
alexbatashev commented 3 years ago

@GeekOffTheStreet I realized that ARM architecture is actually not enabled by default. You have to pass --arm flag to configure.py script along with other flags. I'm, however, a bit skeptical if the build will finish successfully. I happen to own Jetson Nano, and I'm getting GCC ICE while building clangSema, but I didn't have time do dig in since building llvm on such a low performance machine is a nightmare.

GeekOffTheStreet commented 3 years ago

Hey, thanks again for the suggestion. I was able to get it to build and simple programs linking. However, it seems that there is an issue with clang recognizing neon SIMD. I've tried various architecture flags, but still am receiving errors similar to:

/home/user/sycl_workspace/llvm/build/lib/clang/13.0.0/include/arm_neon.h:71:24: error: 'neon_vector_type' attribute is not supported on targets missing 'neon' or 'mve'; specify an appropriate -march= or -mcpu=
typedef __attribute__((neon_vector_type(4))) uint32_t uint32x4_t;

A couple different combinations I've tried are:

-march=armv8-a -mcpu=cortex-a57 -O2 -ftree-vectorize
-march=armv8-a+crypto -mcpu=cortex-a57+crypto

Is there something else I need to pass into configure?

GeekOffTheStreet commented 3 years ago

For what it's worth, when I look at the verbose output of clang, it shows neon in the target options ( -target-feature +neon):

$ clang++ -v -c blah.cpp
clang version 13.0.0 (https://github.com/intel/llvm 78a0b199e1f334938da9d7d1fabcf94937571482)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/user/sycl_workspace/llvm/build/bin
Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/9
Selected GCC installation: /usr/lib/gcc/aarch64-linux-gnu/9
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /usr/local/cuda-10.2, version 10.2
 (in-process)
 "/home/user/sycl_workspace/llvm/build/bin/clang-13" -cc1 -triple aarch64-unknown-linux-gnu -emit-obj -mrelax-all --mrelax-relocations -disable-free -main-file-name blah.cpp -mrelocation-model static -mframe-pointer=non-leaf -fmath-errno -fno-rounding-math -mconstructor-aliases -munwind-tables -target-cpu generic -target-feature +neon -target-abi aapcs -fallow-half-arguments-and-returns -fno-split-dwarf-inlining -debugger-tuning=gdb -v -resource-dir /home/user/sycl_workspace/llvm/build/lib/clang/13.0.0 -internal-isystem /usr/lib/gcc/aarch64-linux-gnu/9/../../../../include/c++/9 -internal-isystem /usr/lib/gcc/aarch64-linux-gnu/9/../../../../include/aarch64-linux-gnu/c++/9 -internal-isystem /usr/lib/gcc/aarch64-linux-gnu/9/../../../../include/aarch64-linux-gnu/c++/9 -internal-isystem /usr/lib/gcc/aarch64-linux-gnu/9/../../../../include/c++/9/backward -internal-isystem /usr/local/include -internal-isystem /home/user/sycl_workspace/llvm/build/lib/clang/13.0.0/include -internal-externc-isystem /usr/include/aarch64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -fdebug-compilation-dir /home/user -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -fcxx-exceptions -fexceptions -fcolor-diagnostics -faddrsig -o blah.o -x c++ blah.cpp
clang -cc1 version 13.0.0 based upon LLVM 13.0.0git default target aarch64-unknown-linux-gnu
ignoring nonexistent directory "/include"
ignoring duplicate directory "/usr/lib/gcc/aarch64-linux-gnu/9/../../../../include/aarch64-linux-gnu/c++/9"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/aarch64-linux-gnu/9/../../../../include/c++/9
 /usr/lib/gcc/aarch64-linux-gnu/9/../../../../include/aarch64-linux-gnu/c++/9
 /usr/lib/gcc/aarch64-linux-gnu/9/../../../../include/c++/9/backward
 /usr/local/include
 /home/user/sycl_workspace/llvm/build/lib/clang/13.0.0/include
 /usr/include/aarch64-linux-gnu
 /usr/include
End of search list.
GeekOffTheStreet commented 3 years ago

One other note, it appears the issue is directly related to the -fsycl option. I created a dummy source file:

$ cat blah.cpp
#include <arm_neon.h>
void blah() {}

If I compile with clang++ -c blah.cpp it compiles Ok. If I switch to: clang++ -fsycl -c blah.cpp it fails with the errors above.

jbrodman commented 3 years ago

Hi - you might look here to see if there's anything of use: https://github.com/jeffhammond/intel-llvm/tree/agx-works

bader commented 3 years ago

I'm not sure if we can classify this issue as a bug. We might need to make it more explicit in our documentation, but x86 is the only tested host architecture supported by DPC++. There seem to be some users compiling DCP++ on ARM host, but CI system doesn't validate this configuration regularly, so it might be broken from time to time. If anyone is interested in making ARM host support official and have resources to maintain it, please, let me know.

I suggest we replace "bug" label with "enhancement".

@pvchupin, @AerialMantis, FYI.

AerialMantis commented 3 years ago

I agree, we are not currently maintaining support for CUDA with ARM host and since this isn't supported by the CI it makes more sense for this to be an enhancement, I'll make this switch, and also add the documentation label.