Document build instructions for kokkos and sycl

valassi commented 2 years ago

Hi @jtchilders @nscottnichols I am starting to have a look at kokkos and sycl.

As a first thing I would like to build the code. Is this documented somewhere? Can you write up otherwise please?

On a specific issue, I just tried a simple 'make' from madgraph4gpuX/epochX/kokkos/ee_mumu.auto/SubProcesses/P1_Sigma_sm_epem_mupmum. This fails as follows:

[avalassi@itscrd70 gcc10.2/cvmfs] /data/avalassi/GPU2020/madgraph4gpuX/epochX/kokkos/ee_mumu.auto/SubProcesses/P1_Sigma_sm_epem_mupmum> make
make -C ../../src cuda
make[1]: Entering directory `/data/avalassi/GPU2020/madgraph4gpuX/epochX/kokkos/ee_mumu.auto/src'
/home/jchilders/git/kokkos/install_v100/bin/nvcc_wrapper -O3 -ffast-math --std=c++17 -I/home/jchilders/git/kokkos/install_v100/include -arch=compute_70 --expt-extended-lambda --expt-relaxed-constexpr -use_fast_math --openmp -lineinfo -c Parameters_sm.cc -o Parameters_sm.cuda.o
make[1]: /home/jchilders/git/kokkos/install_v100/bin/nvcc_wrapper: Command not found
make[1]: *** [Parameters_sm.cuda.o] Error 127
make[1]: Leaving directory `/data/avalassi/GPU2020/madgraph4gpuX/epochX/kokkos/ee_mumu.auto/src'
make: *** [cuda] Error 2

I see that the hardcoded jchilders paths are only there if KOKKOS env variables are not set. So I guess I just need to install kokkos somehow and then set those env variables?

Sorry you might have already documented this somewhere and I may have missed it... thanks!

Andrea

valassi commented 2 years ago

Maybe one initial question for both sycl and kokkos is the following: can I install and build a single kokkos and a single sycl installation, or do you recommend a separate installation for each of nvidia, amd, intel, etc? Thanks Andrea

valassi commented 2 years ago

Note that ninja is required to build sycl (see #384). At CERN I am using

export PATH=/cvmfs/sft.cern.ch/lcg/releases/ninja/1.9.0-5d651/x86_64-centos7-gcc9-opt/bin:$PATH

valassi commented 2 years ago

To build sycl I tried a simple

  python $DPCPP_HOME/llvm/buildbot/configure.py --cuda

but I get

[1/9] Creating directories for 'vc-intrinsics-populate'
[1/9] Performing download step (git clone) for 'vc-intrinsics-populate'
Cloning into 'vc-intrinsics-src'...
warning: templates not found in /workspace/install/git/2.29.2/x86_64-centos7-gcc8-opt/share/git-core/templates
Note: switching to 'abce9184b7a3a7fe1b02289b9285610d9dc45465'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at abce918 Format InstructionSimplify names
/cvmfs/sft.cern.ch/lcg/releases/git/2.29.2-e475b/x86_64-centos7-gcc8-opt/libexec/git-core/git-sh-setup: line 46: /workspace/install/git/2.29.2/x86_64-centos7-gcc8-opt/libexec/git-core/git-sh-i18n: No such file or directory
CMake Error at vc-intrinsics-subbuild/vc-intrinsics-populate-prefix/tmp/vc-intrinsics-populate-gitclone.cmake:52 (message):
  Failed to update submodules in:
  '/data/avalassi/GPU2020/SYCL/workspace/llvm/build/_deps/vc-intrinsics-src'

FAILED: vc-intrinsics-populate-prefix/src/vc-intrinsics-populate-stamp/vc-intrinsics-populate-download 
cd /data/avalassi/GPU2020/SYCL/workspace/llvm/build/_deps && /cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/bin/cmake -P /data/avalassi/GPU2020/SYCL/workspace/llvm/build/_deps/vc-intrinsics-subbuild/vc-intrinsics-populate-prefix/tmp/vc-intrinsics-populate-gitclone.cmake && /cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/bin/cmake -E touch /data/avalassi/GPU2020/SYCL/workspace/llvm/build/_deps/vc-intrinsics-subbuild/vc-intrinsics-populate-prefix/src/vc-intrinsics-populate-stamp/vc-intrinsics-populate-download
ninja: build stopped: subcommand failed.

CMake Error at /cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/share/cmake-3.18/Modules/FetchContent.cmake:987 (message):
  Build step for vc-intrinsics failed: 1
Call Stack (most recent call first):
  /cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/share/cmake-3.18/Modules/FetchContent.cmake:1082:EVAL:2 (__FetchContent_directPopulate)
  /cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/share/cmake-3.18/Modules/FetchContent.cmake:1082 (cmake_language)
  /cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/share/cmake-3.18/Modules/FetchContent.cmake:1125 (FetchContent_Populate)
  lib/SYCLLowerIR/CMakeLists.txt:27 (FetchContent_MakeAvailable)

Is this a problem with the versions of ninja or cmake or gcc?

jtchilders commented 2 years ago

Maybe one initial question for both sycl and kokkos is the following: can I install and build a single kokkos and a single sycl installation, or do you recommend a separate installation for each of nvidia, amd, intel, etc? Thanks Andrea

Andea, For Kokkos, I git clone the repo, and since it uses CMake to build, I simply make subfolders named build_v100 and install_v100 to hold the compilation and binary folder for each architecture. So yes, you need a special build of Kokkos for each architecture.

valassi commented 2 years ago

Hi @jtchilders thanks I will give it a try on kokkos.

For sycl, as my previous build attempt failed, I tried using the build provided by openlab in oneapi, see https://openlab-systems.web.cern.ch/intel/oneapi/

I then tried to build eemumu in sycl. Apparently there is some level of cuda support in this cvmfs build, because I got an error that my cuda 11.6 was not supported,

dpcpp: warning: CUDA version is newer than the latest supported version 11.5 [-Wunknown-cuda-version]

So I went back to cuda 11.5 and tried again. (By the way which cuda are you using with sycl?)

Then howevr I got two more issues (I was getting them also with cuda 116 anyway)

ccache dpcpp -O3 -march=native -std=c++20 -I. -I../../src -I../../../../../tools -Wall -Wshadow -Wextra -ffast-math  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_NTPBMAX=1024 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -target-backend '--cuda-gpu-arch=sm_70' -fgpu-rdc --cuda-path=/usr/local/cuda-11.6 -o check.exe check_sa.cc CPPProcess.cc -pthread -L../../lib -lmg5amc_common -lstdc++fs
dpcpp: error: unknown argument: '-target-backend'
dpcpp: warning: CUDA version is newer than the latest supported version 11.5 [-Wunknown-cuda-version]
dpcpp: error: cannot find 'remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc'; provide path to libspirv library via '-fsycl-libspirv-path', or pass '-fno-sycl-libspirv' to build without linking with libspirv
dpcpp: error: cannot find 'remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc'; provide path to libspirv library via '-fsycl-libspirv-path', or pass '-fno-sycl-libspirv' to build without linking with libspirv
dpcpp: error: cannot find 'remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc'; provide path to libspirv library via '-fsycl-libspirv-path', or pass '-fno-sycl-libspirv' to build without linking with libspirv
make: *** [check.exe] Error 1

I can try to remove spirv, but the target backend seems more problematic. I have essentially

env | egrep '(CUDA|CXX|CC=|SYCL)'
CUDA_HOME=/usr/local/cuda-11.5
CXX=dpcpp
SYCLFLAGS=-fsycl -fsycl-targets=nvptx64-nvidia-cuda -target-backend '--cuda-gpu-arch=sm_70' -fgpu-rdc --cuda-path=/usr/local/cuda-11.5
CC=dpcpp

which gcc
/cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/bin/gcc

which dpcpp
/cvmfs/projects.cern.ch/intelsw/oneAPI/linux/x86_64/2022/compiler/2022.1.0/linux/bin/dpcpp

Any suggestion? Thanks Andrea

valassi commented 2 years ago

Next question on sycl. I removed spirv and replaced '-target-backend' by '-Xs', this seems to progress further, but fails with a different error

ccache dpcpp -O3 -march=native -std=c++20 -I. -I../../src -I../../../../../tools -Wall -Wshadow -Wextra -ffast-math  -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_NTPBMAX=1024 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xs '--cuda-gpu-arch=sm_70' -fgpu-rdc --cuda-path=/usr/local/cuda-11.5 -o check.exe check_sa.cc CPPProcess.cc -pthread -L../../lib -lmg5amc_common -lstdc++fs -fno-sycl-libspirv
check_sa.cc:679:10: warning: variable 'sqswtim' set but not used [-Wunused-but-set-variable]
  double sqswtim = 0;
         ^
check_sa.cc:667:10: warning: variable 'sqsrtim' set but not used [-Wunused-but-set-variable]
  double sqsrtim = 0;
         ^
check_sa.cc:655:10: warning: variable 'sqsgtim' set but not used [-Wunused-but-set-variable]
  double sqsgtim = 0;
         ^
check_sa.cc:693:10: warning: variable 'sqsw3atim' set but not used [-Wunused-but-set-variable]
  double sqsw3atim = 0;
         ^
check_sa.cc:205:8: warning: variable 'bridge' set but not used [-Wunused-but-set-variable]
  bool bridge = false;
       ^
check_sa.cc:260:27: warning: comparison of integers of different signs: 'int' and 'std::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
          for (int i=0; i < devices.size(); i++)
                        ~ ^ ~~~~~~~~~~~~~~
check_sa.cc:310:23: warning: comparison of integers of different signs: 'int' and 'std::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
      for (int i=0; i < devices.size(); i++) {
                    ~ ^ ~~~~~~~~~~~~~~
7 warnings generated.
error: unable to create target: 'No available targets are compatible with triple "nvptx64-nvidia-cuda"'
1 error generated.
llvm-foreach: 
error: unable to create target: 'No available targets are compatible with triple "nvptx64-nvidia-cuda"'
1 error generated.
llvm-foreach:

Any suggestion on this one?

By the way, I am using

dpcpp --version
Intel(R) oneAPI DPC++/C++ Compiler 2022.1.0 (2022.1.0.20220316)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /cvmfs/projects.cern.ch/intelsw/oneAPI/linux/x86_64/2022/compiler/2022.1.0/linux/bin-llvm

Is this the same version you are using for sycl, or newer, or older?

Thanks Andrea

valassi commented 2 years ago

Hm this seems to suggest that my sycl on cvmfs was built without cuda support, https://stackoverflow.com/a/71584326 I will ask the openlab guys

valassi commented 2 years ago

I asked the openlab team about their sycl. And just in case I pinget the SPI team about whether they also plan to provide one, following up on https://sft.its.cern.ch/jira/browse/SPI-1630.

Similarly, I asked the SPI team about a central kokkos installation https://sft.its.cern.ch/jira/browse/SPI-2174

valassi commented 2 years ago

About kokkos instead, I managed to build kokkos (with cuda and openmp support) but I get these errors

make ccheck.exe
make[1]: Entering directory `/data/avalassi/GPU2020/madgraph4gpuX/epochX/kokkos/ee_mumu.auto/SubProcesses/P1_Sigma_sm_epem_mupmum'
/data/avalassi/GPU2020/KOKKOS/kokkos-install/bin/nvcc_wrapper -O3 -ffast-math -I../../src --std=c++17 -I/data/avalassi/GPU2020/KOKKOS/kokkos-install/include -arch=compute_70 --expt-extended-lambda --expt-relaxed-constexpr -use_fast_math --openmp -lineinfo -c check.cpp -o check.cuda.o
/data/avalassi/GPU2020/KOKKOS/kokkos-install/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp(320): error: The closure type for a lambda ("lambda [](member_type)->void", defined at ../../src/rambo.h:22) cannot be used in the template argument type of a __global__ function template instantiation, unless the lambda is defined within a __device__ or __global__ function, or the flag '-extended-lambda' is specified and the lambda is an extended lambda (a __device__ or __host__ __device__ lambda defined within a __host__ or __host__ __device__ function)
          detected during:
            instantiation of "Kokkos::Impl::cuda_parallel_launch_local_memory" based on template argument <Kokkos::Impl::ParallelFor<lambda [](member_type)->void, Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, Kokkos::Cuda>> 
(320): here
            instantiation of class "Kokkos::Impl::CudaParallelLaunchKernelFunc<DriverType, Kokkos::LaunchBounds<0U, 0U>, Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory> [with DriverType=Kokkos::Impl::ParallelFor<lambda [](member_type)->void, Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, Kokkos::Cuda>]" 
(331): here
            instantiation of class "Kokkos::Impl::CudaParallelLaunchKernelInvoker<DriverType, LaunchBounds, Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory> [with DriverType=Kokkos::Impl::ParallelFor<lambda [](member_type)->void, Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, Kokkos::Cuda>, LaunchBounds=Kokkos::LaunchBounds<0U, 0U>]" 
(607): here
            instantiation of class "Kokkos::Impl::CudaParallelLaunchImpl<DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>, LaunchMechanism> [with DriverType=Kokkos::Impl::ParallelFor<lambda [](member_type)->void, Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, Kokkos::Cuda>, MaxThreadsPerBlock=0U, MinBlocksPerSM=0U, LaunchMechanism=Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory]" 
(689): here
            instantiation of class "Kokkos::Impl::CudaParallelLaunch<DriverType, LaunchBounds, LaunchMechanism, false> [with DriverType=Kokkos::Impl::ParallelFor<lambda [](member_type)->void, Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, Kokkos::Cuda>, LaunchBounds=Kokkos::Impl::LaunchBoundsTrait::base_traits::launch_bounds, LaunchMechanism=Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory]" 
/data/avalassi/GPU2020/KOKKOS/kokkos-install/include/Cuda/Kokkos_Cuda_Parallel.hpp(802): here
            instantiation of "Kokkos::Impl::ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>, Kokkos::Cuda>::ParallelFor(const FunctorType &, const Kokkos::Impl::ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>, Kokkos::Cuda>::Policy &) [with FunctorType=lambda [](member_type)->void, Properties=<Kokkos::DefaultExecutionSpace>]" 
/data/avalassi/GPU2020/KOKKOS/kokkos-install/include/Kokkos_Parallel.hpp(166): here
            instantiation of "void Kokkos::parallel_for(const ExecPolicy &, const FunctorType &, const std::string &, std::enable_if<Kokkos::is_execution_policy<ExecPolicy>::value, void>::type *) [with ExecPolicy=Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, FunctorType=lambda [](member_type)->void]" 
/data/avalassi/GPU2020/KOKKOS/kokkos-install/include/Kokkos_Parallel.hpp(200): here
            instantiation of "void Kokkos::parallel_for(const std::string &, const ExecPolicy &, const FunctorType &) [with ExecPolicy=Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, FunctorType=lambda [](member_type)->void]" 
../../src/rambo.h(38): here
            instantiation of "void get_initial_momenta(Kokkos::View<double ***, ExecSpace>, int, double, const Kokkos::View<double *, ExecSpace> &, const int &, const int &) [with ExecSpace=Kokkos::DefaultExecutionSpace]" 
check.cpp(353): here

I imagine I should open tickets for these individually?

I am using cuda 116 and gcc102. Thanks Andrea

PS I tried using -extended-lambda instead of --expt-extended-lambda, but this also failed

/data/avalassi/GPU2020/KOKKOS/kokkos-install/bin/nvcc_wrapper -O3 -ffast-math -I../../src --std=c++17 -I/data/avalassi/GPU2020/KOKKOS/kokkos-install/include -arch=compute_70 -extended-lambda --expt-relaxed-constexpr -use_fast_math --openmp -lineinfo -c check.cpp -o check.cuda.o
/data/avalassi/GPU2020/KOKKOS/kokkos-install/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp(320): error: The closure type for a lambda ("lambda [](member_type)->void", defined at ../../src/rambo.h:22) cannot be used in the template argument type of a __global__ function template instantiation, unless the lambda is defined within a __device__ or __global__ function, or the flag '-extended-lambda' is specified and the lambda is an extended lambda (a __device__ or __host__ __device__ lambda defined within a __host__ or __host__ __device__ function)

lfield commented 2 years ago

Hm this seems to suggest that my sycl on cvmfs was built without cuda support, https://stackoverflow.com/a/71584326 I will ask the openlab guys Cuda support is not built into the OneAPI version of SYCL due to various licensing reasons. It is one of the issues I have given feedback on to our Intel contacts. I deployed a version supporting cuda on itscrd02 or itscrd03, but don't seem to have access anymore.

madgraph5 / madgraph4gpu

Document build instructions for kokkos and sycl #492