Closed CharlelieLrt closed 10 months ago
@CharlelieLrt can you share the full output from the command, and also conda --version
? Can you also try with mamba
?
The missing packages are not under linux-ppc64le
, but they are under noarch
, so that should be sufficient for conda
to find and use them, even if you're on a PowerPC platform. @m3vaz any idea what might have happened here?
Version is conda 4.6.14
I switched to mamba and it could find the cuda-version package. I could then install dependencies (except the ones for the docs, but I won't need them).
I am now trying to install legate with:
./install.py --cuda --arch volta --network gasnet1 --max-dim 5 --openmp --hdf5 --build-tests --build-examples --conduit ibv
, but I get an error telling me that the version of cmake I am using is incompatible:
CMake Error at CMakeLists.txt:17 (cmake_minimum_required):
CMake 3.22.1 or higher is required. You are running version 3.17.5
My PATH is:
/g/g92/laurent3/miniforge3/envs/legate_base/bin:...
So, I looked at the cmake
I have there, and /g/g92/laurent3/miniforge3/envs/legate_base/bin/cmake --version
shows cmake version 3.27.9
. On the contrary, the command cmake3 --version
shows cmake3 version 3.17.5
which is installed in /usr/bin/cmake3
. So, I assume that install.py
is trying to use this system-wide install of cmake instead of the one in my conda environment. I tried providing an extra argument --with-cmake /g/g92/laurent3/miniforge3/envs/legate_base/bin/cmake
to install.py
, but it did not change anything.
I believe this was mentioned in #837
I pushed a fix here, could you please try that? https://github.com/nv-legate/legate.core/pull/908
It did not solve the problem. Now I see:
[...]
conduit: ibv
gasnet_system: None
nccl_dir: None
cmake_exe: /g/g92/laurent3/miniforge3/envs/legate_base/bin/cmake
cmake_generator: Ninja
[...]
But later on:
Configuring Project
Working directory:
/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build
Command:
/usr/bin/cmake3 /usr/WS1/laurent3/Codes/LEGATE/legate.core -G Ninja [...]
So it's still trying to use the system's cmake3
Could it be because pip --global-option is depreceated? (https://github.com/pypa/pip/issues/11859)
As a temporary workaround I have defined a symlink for cmake3
to the right cmake
.
I am now running into a cuda compilation error:
Finished release [optimized] target(s) in 2m 44s
[98/261] /usr/tce/packages/cuda/cuda-12.0.0/bin/nvcc -forward-unknown-to-host-compiler -DLEGATE_USE_CUDA -DLEGATE_USE_NETWORK -DLEGATE_USE_OPENMP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DUSE_CUDA -DUSE_HDF -Dlegate_core_EXPORTS -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/src -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-src/runtime -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-src/runtime/mappers -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-build/runtime -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/thrust-src -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/thrust-src/dependencies/cub -isystem /g/g92/laurent3/miniforge3/envs/legate_base/include -isystem /usr/tce/packages/cuda/cuda-12.0.0/nvidia/include -isystem /usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-2020.08.19/include -isystem /usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/include -O2 -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" -Xcompiler=-fPIC -Xfatbin=-compress-all --expt-extended-lambda --expt-relaxed-constexpr -Wno-deprecated-gpu-targets -Xcompiler -pthread -MD -MT legate-core-cpp/CMakeFiles/legate_core.dir/src/core/cuda/stream_pool.cu.o -MF legate-core-cpp/CMakeFiles/legate_core.dir/src/core/cuda/stream_pool.cu.o.d -x cu -c /usr/WS1/laurent3/Codes/LEGATE/legate.core/src/core/cuda/stream_pool.cu -o legate-core-cpp/CMakeFiles/legate_core.dir/src/core/cuda/stream_pool.cu.o
FAILED: legate-core-cpp/CMakeFiles/legate_core.dir/src/core/cuda/stream_pool.cu.o
/usr/tce/packages/cuda/cuda-12.0.0/bin/nvcc -forward-unknown-to-host-compiler -DLEGATE_USE_CUDA -DLEGATE_USE_NETWORK -DLEGATE_USE_OPENMP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DUSE_CUDA -DUSE_HDF -Dlegate_core_EXPORTS -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/src -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-src/runtime -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-src/runtime/mappers -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-build/runtime -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/thrust-src -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/thrust-src/dependencies/cub -isystem /g/g92/laurent3/miniforge3/envs/legate_base/include -isystem /usr/tce/packages/cuda/cuda-12.0.0/nvidia/include -isystem /usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-2020.08.19/include -isystem /usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/include -O2 -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" -Xcompiler=-fPIC -Xfatbin=-compress-all --expt-extended-lambda --expt-relaxed-constexpr -Wno-deprecated-gpu-targets -Xcompiler -pthread -MD -MT legate-core-cpp/CMakeFiles/legate_core.dir/src/core/cuda/stream_pool.cu.o -MF legate-core-cpp/CMakeFiles/legate_core.dir/src/core/cuda/stream_pool.cu.o.d -x cu -c /usr/WS1/laurent3/Codes/LEGATE/legate.core/src/core/cuda/stream_pool.cu -o legate-core-cpp/CMakeFiles/legate_core.dir/src/core/cuda/stream_pool.cu.o
/usr/include/sys/platform/ppc.h(31): error: identifier "__builtin_ppc_get_timebase" is undefined
I am loading cuda 12.0.0 with module load cuda/12.0.0
and my conda environment was generated with --ctk 12.0
So it's still trying to use the system's cmake3 Could it be because pip --global-option is depreceated? (https://github.com/pypa/pip/issues/11859)
I posted some follow-up comments on #908. This falls beyond my (very limited) knowledge around python packaging.
error: identifier "__builtin_ppc_get_timebase" is undefined
What host compiler are you using? if you try compiling an empty file with /usr/tce/packages/cuda/cuda-12.0.0/bin/nvcc --verbose empty.cu
you should be able to see what's getting called. E.g. on my local machine I see
#$ gcc -D__CUDA_ARCH_LIST__=520 -E -x c++ -D__CUDACC__ -D__NVCC__ "-I/usr/local/cuda/bin/../targets/x86_64-linux/include" -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=3 -D__CUDACC_VER_BUILD__=103 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=3 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "a.cu" -o "/tmp/tmpxft_0023d560_00000000-5_a.cpp4.ii"
Are the pure-C++ files compiling correctly? What compiler are they using?
Trying to compile empty.cu
, I get:
#$ gcc -D__NV_NO_HOST_COMPILER_CHECK=1 -std=c++14 -D__CUDA_ARCH_LIST__=520 -E -x c++ -D__CUDACC__ -D__NVCC__ "-I/usr/tce/packages/cuda/cuda-12.0.0/nvidia/bin/../targets/ppc64le-linux/include" -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=0 -D__CUDACC_VER_BUILD__=76 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=0 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" "empty.cu" -o "/var/tmp/laurent3/tmpxft_00003738_00000000-5_empty.cpp4.ii"
The pure C++ files seem to be compiled correctly. They use /usr/tce/packages/gcc/gcc-8.3.1/bin/c++
(c++ (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3)).
I reached out to compiler experts inside Nvidia and on the Legion Zulip for guidance. Unfortunately I don't have easy access to ppc64le machines to try and personally reproduce.
So, given the comments on the Legion Zulip, I switched to a newer commit of Legion (d7121f886127e41773a283cbbaa51c452cd01054
) that includes the fix for the __builtin_ppc_get_timebase
error.
I now have a bunch of failed compilation, such as:
FAILED: legate-core-cpp/CMakeFiles/legate_core.dir/src/core/task/variant_options.cc.o
/usr/tce/packages/gcc/gcc-8.3.1/bin/c++ -DLEGATE_USE_COLLECTIVE -DLEGATE_USE_CUDA -DLEGATE_USE_NETWORK -DLEGATE_USE_OPENMP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DUSE_CUDA -DUSE_HDF -Dlegate_core_EXPORTS -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/src -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-src/runtime -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-src/runtime/mappers -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-build/runtime -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/thrust-src -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/thrust-src/dependencies/cub -isystem /g/g92/laurent3/miniforge3/envs/legate_base/include -isystem /usr/tce/packages/cuda/cuda-12.0.0/nvidia/include -isystem /usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-2020.08.19/include -isystem /usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/include -O2 -std=gnu++17 -fPIC -mcpu=native -maltivec -mabi=altivec -mvsx -UTHRUST_DEVICE_SYSTEM -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -pthread -MD -MT legate-core-cpp/CMakeFiles/legate_core.dir/src/core/task/variant_options.cc.o -MF legate-core-cpp/CMakeFiles/legate_core.dir/src/core/task/variant_options.cc.o.d -o legate-core-cpp/CMakeFiles/legate_core.dir/src/core/task/variant_options.cc.o -c /usr/WS1/laurent3/Codes/LEGATE/legate.core/src/core/task/variant_options.cc
/usr/WS1/laurent3/Codes/LEGATE/legate.core/src/core/task/variant_options.cc: In member function 'void legate::VariantOptions::populate_registrar(Legion::TaskVariantRegistrar&)':
/usr/WS1/laurent3/Codes/LEGATE/legate.core/src/core/task/variant_options.cc:56:13: error: 'struct Legion::TaskVariantRegistrar' has no member named 'set_concurrent'; did you mean 'add_constraint'?
registrar.set_concurrent(concurrent);
^~~~~~~~~~~~~~
add_constraint
Or:
FAILED: legate-core-cpp/CMakeFiles/legate_core.dir/src/core/comm/comm_nccl.cu.o
/usr/tce/packages/cuda/cuda-12.0.0/bin/nvcc -forward-unknown-to-host-compiler -DLEGATE_USE_CUDA -DLEGATE_USE_NETWORK -DLEGATE_USE_OPENMP -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DUSE_CUDA -DUSE_HDF -Dlegate_core_EXPORTS -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/src -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-src/runtime -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-src/runtime/mappers -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-build/runtime -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/thrust-src -I/usr/WS1/laurent3/Codes/LEGATE/legate.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/thrust-src/dependencies/cub -isystem /g/g92/laurent3/miniforge3/envs/legate_base/include -isystem /usr/tce/packages/cuda/cuda-12.0.0/nvidia/include -isystem /usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-2020.08.19/include -isystem /usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/include -O2 -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" -Xcompiler=-fPIC -Xfatbin=-compress-all --expt-extended-lambda --expt-relaxed-constexpr -Wno-deprecated-gpu-targets -Xcompiler -pthread -MD -MT legate-core-cpp/CMakeFiles/legate_core.dir/src/core/comm/comm_nccl.cu.o -MF legate-core-cpp/CMakeFiles/legate_core.dir/src/core/comm/comm_nccl.cu.o.d -x cu -c /usr/WS1/laurent3/Codes/LEGATE/legate.core/src/core/comm/comm_nccl.cu -o legate-core-cpp/CMakeFiles/legate_core.dir/src/core/comm/comm_nccl.cu.o
/usr/WS1/laurent3/Codes/LEGATE/legate.core/src/core/data/store.h(174): error: namespace "Legion" has no member "OutputRegion"
/usr/WS1/laurent3/Codes/LEGATE/legate.core/src/core/data/store.h(205): error: namespace "Legion" has no member "OutputRegion"
/usr/WS1/laurent3/Codes/LEGATE/legate.core/src/core/utilities/deserializer.h(107): error: namespace "Legion" has no member "OutputRegion"
/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/ptr_traits.h(114): error: static assertion failed with "pointer type defines element_type or is like SomePointer<T, Args>"
detected during:
instantiation of class "std::pointer_traits<_Ptr> [with _Ptr=<error-type> *]"
/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/alloc_traits.h(102): here
instantiation of class "std::allocator_traits<_Alloc>::_Ptr<_Func, _Tp, <unnamed>> [with _Alloc=std::allocator<<error-type>>, _Func=std::__allocator_traits_base::__c_pointer, _Tp=const <error-type>, <unnamed>=void]"
/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/alloc_traits.h(135): here
instantiation of class "std::allocator_traits<_Alloc> [with _Alloc=std::allocator<<error-type>>]"
/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/ext/alloc_traits.h(52): here
instantiation of class "__gnu_cxx::__alloc_traits<_Alloc, <unnamed>> [with _Alloc=std::allocator<<error-type>>, <unnamed>=<error-type>]"
/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_vector.h(84): here
instantiation of class "std::_Vector_base<_Tp, _Alloc> [with _Tp=<error-type>, _Alloc=std::allocator<<error-type>>]"
/usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_vector.h(339): here
instantiation of class "std::vector<_Tp, _Alloc> [with _Tp=<error-type>, _Alloc=std::allocator<<error-type>>]"
/usr/WS1/laurent3/Codes/LEGATE/legate.core/src/core/utilities/deserializer.h(107): here
(and many other)
Can you try with top-of-tree control_replication
branch?
Legion commit 04ee5be1dc3b742f195348c78458450f5dd35f44
worked, and no further problem to compile cunumeric, so everything is good (except the few things already mentioned above).
Thanks for your help with this!
I am trying to build legate from source on Lassen (PowerPC9, OS: RHEL 7.9 Maipo) following instructions in the quickstart.
I generate a config file for my conda environment with
./scripts/generate-conda-envs.py --python 3.10 --ctk 11.8 --os linux
. The config file is:Trying to create the environmnet gives the following error:
In addition, running
conda search cuda-version -c nvidia -c conda-forge
suggests that thecuda-version
package does not exist in these channels.