Closed lukebroskop closed 2 years ago
Can you share the full spack-build-out.txt
and spack-build-env.txt
? That will give more context.
For PyTorch folks, note that this build was done with the Spack package manager which builds everything from source.
Sorry I forget to add that! spack-build-env.txt spack-build-out.txt
I think there are two bugs here:
USE_ROCM
environment variable being set to ON
, CMake is detecting that ROCM support should be disabledUSE_CUDA=OFF
as an environment variable and in CMake, CMake is still trying to link to a CUDA installationWill continue to dig into 1, but 2 is out of my area of expertise.
Hmm, according to https://github.com/pytorch/pytorch/blob/v1.9.0/tools/setup_helpers/cmake.py#L264 USE_ROCM
should be passed.
@kolamsrinivas added PyTorch + ROCm support to Spack in https://github.com/spack/spack/pull/17410, maybe they can comment on this.
I'm not very familiar with spack, but is there any way to verify that ROCm is installed in your environment as a prereq?
It looks like our Spack recipe is missing a dependency on ROCm. Does anyone know exactly which ROCm components are required? Right now Spack has packages for:
$ spack list rocm
==> 14 packages.
rocm-bandwidth-test rocm-cmake rocm-debug-agent rocm-gdb rocm-openmp-extras rocm-smi-lib rocm-validation-suite
rocm-clang-ocl rocm-dbgapi rocm-device-libs rocm-opencl rocm-smi rocm-tensile rocminfo
Also pinging our Spack + ROCm experts: @haampie @srekolam @arjun-raj-kuppala
yeah, the pytorch recipe in spack is not updated for rocm recipes.. i am trying to add one by one all the relevant recipes..there are spack packages available with the same name as mentioned in https://github.com/pytorch/pytorch/blob/master/cmake/public/LoadHIP.cmake spack list hip ==> 13 packages. hip hip-rocclr hipace hipblas hipcub hipfft hipfort hipify-clang hipsparse hipsycl kahip miopen-hip r-chipseq spack list roc will show the below rocblas, rocfft,rocthrust, rocrand rocprim,roctracer-dev
I think these below rocm packages need to be added to the recipe but i see a build failure related to {ROCM_PATH}/.info/version-dev which need to be addressed for spack case- depends_on('miopen-hip', when='+rocm') depends_on('rccl', when='+rocm') depends_on('rocprim', when='+rocm') depends_on('hipcub', when='+rocm') depends_on('rocthrust', when='+rocm') depends_on('roctracer-dev', when='+rocm') depends_on('rocrand', when='+rocm') depends_on('hipsparse', when='+rocm') depends_on('hipfft', when='+rocm') depends_on('rocfft', when='+rocm') depends_on('hsa-rocr-dev', when='+rocm') depends_on('hip', when='+rocm')
the py-torch recipe is now updated with rocm dependencies and is building with spack externals support for rocm @lukebroskop , can we close this issue .
Thanks @srekolam !
I'm receiving errors during the caffe2 build portion of pytorch from the cuda namespace:
Here's what I use to install
py-torch@1.9.0 %gcc +rocm~cuda~mkldnn~cudnn~magma~nccl~valgrind~tensorpipe~kineto~caffe2
PyTorch Version (e.g., 1.0): 1.9.0 OS (e.g., Linux): SUSE Linux Enterprise Server 15 SP2 How you installed PyTorch (conda, pip, source): source Build command you used (if compiling from source): python setup.py install Python version: 3.8.11 CUDA/cuDNN version: Rocm 4.2.0 GPU models and configuration: MI100, gfx908
Any idea's what to try @adamjstewart ?
cc @malfet @seemethere @walterddr @jeffdaily @sunway513 @jithunnair-amd @ROCmSupport