Open valassi opened 2 years ago
Maybe one initial question for both sycl and kokkos is the following: can I install and build a single kokkos and a single sycl installation, or do you recommend a separate installation for each of nvidia, amd, intel, etc? Thanks Andrea
Note that ninja is required to build sycl (see #384). At CERN I am using
export PATH=/cvmfs/sft.cern.ch/lcg/releases/ninja/1.9.0-5d651/x86_64-centos7-gcc9-opt/bin:$PATH
To build sycl I tried a simple
python $DPCPP_HOME/llvm/buildbot/configure.py --cuda
but I get
[1/9] Creating directories for 'vc-intrinsics-populate'
[1/9] Performing download step (git clone) for 'vc-intrinsics-populate'
Cloning into 'vc-intrinsics-src'...
warning: templates not found in /workspace/install/git/2.29.2/x86_64-centos7-gcc8-opt/share/git-core/templates
Note: switching to 'abce9184b7a3a7fe1b02289b9285610d9dc45465'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at abce918 Format InstructionSimplify names
/cvmfs/sft.cern.ch/lcg/releases/git/2.29.2-e475b/x86_64-centos7-gcc8-opt/libexec/git-core/git-sh-setup: line 46: /workspace/install/git/2.29.2/x86_64-centos7-gcc8-opt/libexec/git-core/git-sh-i18n: No such file or directory
CMake Error at vc-intrinsics-subbuild/vc-intrinsics-populate-prefix/tmp/vc-intrinsics-populate-gitclone.cmake:52 (message):
Failed to update submodules in:
'/data/avalassi/GPU2020/SYCL/workspace/llvm/build/_deps/vc-intrinsics-src'
FAILED: vc-intrinsics-populate-prefix/src/vc-intrinsics-populate-stamp/vc-intrinsics-populate-download
cd /data/avalassi/GPU2020/SYCL/workspace/llvm/build/_deps && /cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/bin/cmake -P /data/avalassi/GPU2020/SYCL/workspace/llvm/build/_deps/vc-intrinsics-subbuild/vc-intrinsics-populate-prefix/tmp/vc-intrinsics-populate-gitclone.cmake && /cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/bin/cmake -E touch /data/avalassi/GPU2020/SYCL/workspace/llvm/build/_deps/vc-intrinsics-subbuild/vc-intrinsics-populate-prefix/src/vc-intrinsics-populate-stamp/vc-intrinsics-populate-download
ninja: build stopped: subcommand failed.
CMake Error at /cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/share/cmake-3.18/Modules/FetchContent.cmake:987 (message):
Build step for vc-intrinsics failed: 1
Call Stack (most recent call first):
/cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/share/cmake-3.18/Modules/FetchContent.cmake:1082:EVAL:2 (__FetchContent_directPopulate)
/cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/share/cmake-3.18/Modules/FetchContent.cmake:1082 (cmake_language)
/cvmfs/sft.cern.ch/lcg/releases/CMake/3.18.4-2ffec/x86_64-centos7-gcc8-opt/share/cmake-3.18/Modules/FetchContent.cmake:1125 (FetchContent_Populate)
lib/SYCLLowerIR/CMakeLists.txt:27 (FetchContent_MakeAvailable)
Is this a problem with the versions of ninja or cmake or gcc?
Maybe one initial question for both sycl and kokkos is the following: can I install and build a single kokkos and a single sycl installation, or do you recommend a separate installation for each of nvidia, amd, intel, etc? Thanks Andrea
Andea, For Kokkos, I git clone
the repo, and since it uses CMake to build, I simply make subfolders named build_v100
and install_v100
to hold the compilation and binary folder for each architecture. So yes, you need a special build of Kokkos for each architecture.
Hi @jtchilders thanks I will give it a try on kokkos.
For sycl, as my previous build attempt failed, I tried using the build provided by openlab in oneapi, see https://openlab-systems.web.cern.ch/intel/oneapi/
I then tried to build eemumu in sycl. Apparently there is some level of cuda support in this cvmfs build, because I got an error that my cuda 11.6 was not supported,
dpcpp: warning: CUDA version is newer than the latest supported version 11.5 [-Wunknown-cuda-version]
So I went back to cuda 11.5 and tried again. (By the way which cuda are you using with sycl?)
Then howevr I got two more issues (I was getting them also with cuda 116 anyway)
ccache dpcpp -O3 -march=native -std=c++20 -I. -I../../src -I../../../../../tools -Wall -Wshadow -Wextra -ffast-math -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_NTPBMAX=1024 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -target-backend '--cuda-gpu-arch=sm_70' -fgpu-rdc --cuda-path=/usr/local/cuda-11.6 -o check.exe check_sa.cc CPPProcess.cc -pthread -L../../lib -lmg5amc_common -lstdc++fs
dpcpp: error: unknown argument: '-target-backend'
dpcpp: warning: CUDA version is newer than the latest supported version 11.5 [-Wunknown-cuda-version]
dpcpp: error: cannot find 'remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc'; provide path to libspirv library via '-fsycl-libspirv-path', or pass '-fno-sycl-libspirv' to build without linking with libspirv
dpcpp: error: cannot find 'remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc'; provide path to libspirv library via '-fsycl-libspirv-path', or pass '-fno-sycl-libspirv' to build without linking with libspirv
dpcpp: error: cannot find 'remangled-l64-signed_char.libspirv-nvptx64--nvidiacl.bc'; provide path to libspirv library via '-fsycl-libspirv-path', or pass '-fno-sycl-libspirv' to build without linking with libspirv
make: *** [check.exe] Error 1
I can try to remove spirv, but the target backend seems more problematic. I have essentially
env | egrep '(CUDA|CXX|CC=|SYCL)'
CUDA_HOME=/usr/local/cuda-11.5
CXX=dpcpp
SYCLFLAGS=-fsycl -fsycl-targets=nvptx64-nvidia-cuda -target-backend '--cuda-gpu-arch=sm_70' -fgpu-rdc --cuda-path=/usr/local/cuda-11.5
CC=dpcpp
which gcc
/cvmfs/sft.cern.ch/lcg/releases/gcc/10.2.0-c44b3/x86_64-centos7/bin/gcc
which dpcpp
/cvmfs/projects.cern.ch/intelsw/oneAPI/linux/x86_64/2022/compiler/2022.1.0/linux/bin/dpcpp
Any suggestion? Thanks Andrea
Next question on sycl. I removed spirv and replaced '-target-backend' by '-Xs', this seems to progress further, but fails with a different error
ccache dpcpp -O3 -march=native -std=c++20 -I. -I../../src -I../../../../../tools -Wall -Wshadow -Wextra -ffast-math -DMGONGPU_FPTYPE_DOUBLE -DMGONGPU_NTPBMAX=1024 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xs '--cuda-gpu-arch=sm_70' -fgpu-rdc --cuda-path=/usr/local/cuda-11.5 -o check.exe check_sa.cc CPPProcess.cc -pthread -L../../lib -lmg5amc_common -lstdc++fs -fno-sycl-libspirv
check_sa.cc:679:10: warning: variable 'sqswtim' set but not used [-Wunused-but-set-variable]
double sqswtim = 0;
^
check_sa.cc:667:10: warning: variable 'sqsrtim' set but not used [-Wunused-but-set-variable]
double sqsrtim = 0;
^
check_sa.cc:655:10: warning: variable 'sqsgtim' set but not used [-Wunused-but-set-variable]
double sqsgtim = 0;
^
check_sa.cc:693:10: warning: variable 'sqsw3atim' set but not used [-Wunused-but-set-variable]
double sqsw3atim = 0;
^
check_sa.cc:205:8: warning: variable 'bridge' set but not used [-Wunused-but-set-variable]
bool bridge = false;
^
check_sa.cc:260:27: warning: comparison of integers of different signs: 'int' and 'std::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
for (int i=0; i < devices.size(); i++)
~ ^ ~~~~~~~~~~~~~~
check_sa.cc:310:23: warning: comparison of integers of different signs: 'int' and 'std::vector::size_type' (aka 'unsigned long') [-Wsign-compare]
for (int i=0; i < devices.size(); i++) {
~ ^ ~~~~~~~~~~~~~~
7 warnings generated.
error: unable to create target: 'No available targets are compatible with triple "nvptx64-nvidia-cuda"'
1 error generated.
llvm-foreach:
error: unable to create target: 'No available targets are compatible with triple "nvptx64-nvidia-cuda"'
1 error generated.
llvm-foreach:
Any suggestion on this one?
By the way, I am using
dpcpp --version
Intel(R) oneAPI DPC++/C++ Compiler 2022.1.0 (2022.1.0.20220316)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /cvmfs/projects.cern.ch/intelsw/oneAPI/linux/x86_64/2022/compiler/2022.1.0/linux/bin-llvm
Is this the same version you are using for sycl, or newer, or older?
Thanks Andrea
Hm this seems to suggest that my sycl on cvmfs was built without cuda support, https://stackoverflow.com/a/71584326 I will ask the openlab guys
I asked the openlab team about their sycl. And just in case I pinget the SPI team about whether they also plan to provide one, following up on https://sft.its.cern.ch/jira/browse/SPI-1630.
Similarly, I asked the SPI team about a central kokkos installation https://sft.its.cern.ch/jira/browse/SPI-2174
About kokkos instead, I managed to build kokkos (with cuda and openmp support) but I get these errors
make ccheck.exe
make[1]: Entering directory `/data/avalassi/GPU2020/madgraph4gpuX/epochX/kokkos/ee_mumu.auto/SubProcesses/P1_Sigma_sm_epem_mupmum'
/data/avalassi/GPU2020/KOKKOS/kokkos-install/bin/nvcc_wrapper -O3 -ffast-math -I../../src --std=c++17 -I/data/avalassi/GPU2020/KOKKOS/kokkos-install/include -arch=compute_70 --expt-extended-lambda --expt-relaxed-constexpr -use_fast_math --openmp -lineinfo -c check.cpp -o check.cuda.o
/data/avalassi/GPU2020/KOKKOS/kokkos-install/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp(320): error: The closure type for a lambda ("lambda [](member_type)->void", defined at ../../src/rambo.h:22) cannot be used in the template argument type of a __global__ function template instantiation, unless the lambda is defined within a __device__ or __global__ function, or the flag '-extended-lambda' is specified and the lambda is an extended lambda (a __device__ or __host__ __device__ lambda defined within a __host__ or __host__ __device__ function)
detected during:
instantiation of "Kokkos::Impl::cuda_parallel_launch_local_memory" based on template argument <Kokkos::Impl::ParallelFor<lambda [](member_type)->void, Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, Kokkos::Cuda>>
(320): here
instantiation of class "Kokkos::Impl::CudaParallelLaunchKernelFunc<DriverType, Kokkos::LaunchBounds<0U, 0U>, Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory> [with DriverType=Kokkos::Impl::ParallelFor<lambda [](member_type)->void, Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, Kokkos::Cuda>]"
(331): here
instantiation of class "Kokkos::Impl::CudaParallelLaunchKernelInvoker<DriverType, LaunchBounds, Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory> [with DriverType=Kokkos::Impl::ParallelFor<lambda [](member_type)->void, Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, Kokkos::Cuda>, LaunchBounds=Kokkos::LaunchBounds<0U, 0U>]"
(607): here
instantiation of class "Kokkos::Impl::CudaParallelLaunchImpl<DriverType, Kokkos::LaunchBounds<MaxThreadsPerBlock, MinBlocksPerSM>, LaunchMechanism> [with DriverType=Kokkos::Impl::ParallelFor<lambda [](member_type)->void, Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, Kokkos::Cuda>, MaxThreadsPerBlock=0U, MinBlocksPerSM=0U, LaunchMechanism=Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory]"
(689): here
instantiation of class "Kokkos::Impl::CudaParallelLaunch<DriverType, LaunchBounds, LaunchMechanism, false> [with DriverType=Kokkos::Impl::ParallelFor<lambda [](member_type)->void, Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, Kokkos::Cuda>, LaunchBounds=Kokkos::Impl::LaunchBoundsTrait::base_traits::launch_bounds, LaunchMechanism=Kokkos::Impl::Experimental::CudaLaunchMechanism::LocalMemory]"
/data/avalassi/GPU2020/KOKKOS/kokkos-install/include/Cuda/Kokkos_Cuda_Parallel.hpp(802): here
instantiation of "Kokkos::Impl::ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>, Kokkos::Cuda>::ParallelFor(const FunctorType &, const Kokkos::Impl::ParallelFor<FunctorType, Kokkos::TeamPolicy<Properties...>, Kokkos::Cuda>::Policy &) [with FunctorType=lambda [](member_type)->void, Properties=<Kokkos::DefaultExecutionSpace>]"
/data/avalassi/GPU2020/KOKKOS/kokkos-install/include/Kokkos_Parallel.hpp(166): here
instantiation of "void Kokkos::parallel_for(const ExecPolicy &, const FunctorType &, const std::string &, std::enable_if<Kokkos::is_execution_policy<ExecPolicy>::value, void>::type *) [with ExecPolicy=Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, FunctorType=lambda [](member_type)->void]"
/data/avalassi/GPU2020/KOKKOS/kokkos-install/include/Kokkos_Parallel.hpp(200): here
instantiation of "void Kokkos::parallel_for(const std::string &, const ExecPolicy &, const FunctorType &) [with ExecPolicy=Kokkos::TeamPolicy<Kokkos::DefaultExecutionSpace>, FunctorType=lambda [](member_type)->void]"
../../src/rambo.h(38): here
instantiation of "void get_initial_momenta(Kokkos::View<double ***, ExecSpace>, int, double, const Kokkos::View<double *, ExecSpace> &, const int &, const int &) [with ExecSpace=Kokkos::DefaultExecutionSpace]"
check.cpp(353): here
I imagine I should open tickets for these individually?
I am using cuda 116 and gcc102. Thanks Andrea
PS I tried using -extended-lambda instead of --expt-extended-lambda, but this also failed
/data/avalassi/GPU2020/KOKKOS/kokkos-install/bin/nvcc_wrapper -O3 -ffast-math -I../../src --std=c++17 -I/data/avalassi/GPU2020/KOKKOS/kokkos-install/include -arch=compute_70 -extended-lambda --expt-relaxed-constexpr -use_fast_math --openmp -lineinfo -c check.cpp -o check.cuda.o
/data/avalassi/GPU2020/KOKKOS/kokkos-install/include/Cuda/Kokkos_Cuda_KernelLaunch.hpp(320): error: The closure type for a lambda ("lambda [](member_type)->void", defined at ../../src/rambo.h:22) cannot be used in the template argument type of a __global__ function template instantiation, unless the lambda is defined within a __device__ or __global__ function, or the flag '-extended-lambda' is specified and the lambda is an extended lambda (a __device__ or __host__ __device__ lambda defined within a __host__ or __host__ __device__ function)
Hm this seems to suggest that my sycl on cvmfs was built without cuda support, https://stackoverflow.com/a/71584326 I will ask the openlab guys Cuda support is not built into the OneAPI version of SYCL due to various licensing reasons. It is one of the issues I have given feedback on to our Intel contacts. I deployed a version supporting cuda on itscrd02 or itscrd03, but don't seem to have access anymore.
Hi @jtchilders @nscottnichols I am starting to have a look at kokkos and sycl.
As a first thing I would like to build the code. Is this documented somewhere? Can you write up otherwise please?
On a specific issue, I just tried a simple 'make' from madgraph4gpuX/epochX/kokkos/ee_mumu.auto/SubProcesses/P1_Sigma_sm_epem_mupmum. This fails as follows:
I see that the hardcoded jchilders paths are only there if KOKKOS env variables are not set. So I guess I just need to install kokkos somehow and then set those env variables?
Sorry you might have already documented this somewhere and I may have missed it... thanks!
Andrea