navi10 pytorch build fails due to missing miopen

jpentland commented 2 years ago

Environment

Hardware	description
GPU	Radeon RC5700XT
CPU	Ryzen

Software	version
OS	Archlinux
ROCm	5.2.x
Python	3.10.6

What is the expected behavior

Following navi10 instructions for building pytorch should not require building miopen separately however, it fails due to missing miopen.

What actually happens

CMake Error at cmake/public/LoadHIP.cmake:147 (find_package): By not providing "Findmiopen.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "miopen", but CMake did not find one.

Could not find a package configuration file provided by "miopen" with any of the following names:

miopenConfig.cmake
miopen-config.cmake

Add the installation prefix of "miopen" to CMAKE_PREFIX_PATH or set "miopen_DIR" to a directory containing one of the above files. If "miopen" provides a separate development package or SDK, be sure it has been installed. Call Stack (most recent call first): cmake/public/LoadHIP.cmake:274 (find_package_and_print_version) cmake/Dependencies.cmake:1264 (include) CMakeLists.txt:696 (include)

How to reproduce

Follow navi10 build on archlinux.

xuhuisheng commented 2 years ago

The navi10 just need re-build ROCm with AMDGPU_TARGETS=gfx1010. The MIOpen didn't use this flag, so you can just build miopen use default parameter.

The MIOpen just equals cuDNN, pytorch need miopen to speed up.

jpentland commented 2 years ago

I was having trouble building miopen as well. But I realized I have to build half first, and thenmiopen.

xuhuisheng commented 2 years ago

The half is just a header. miopen need some dependencies to build, likes boost, zlib, etc. you need install them before build miopen.

jpentland commented 2 years ago

I managed to have miopen installed now, but I have an issue building during the pytorch build. The included library 'fbgemm' fails to build due to some c++ warning being treated as errors:

FAILED: third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o 
/usr/bin/c++ -DFBGEMM_STATIC -I/home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/cpuinfo/include -I/home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm/third_party/asmjit/src -I/home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm/include -I/home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm -I/home/jp/builds/stablediffusion/rocm-build/pytorch/cmake/../third_party/benchmark/include -isystem /home/jp/builds/stablediffusion/rocm-build/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/jp/builds/stablediffusion/rocm-build/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/protobuf/src -isystem /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/gemmlowp -isystem /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/neon2sse -isystem /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/XNNPACK/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -Wall -Wextra -Werror -Wno-deprecated-declarations -O3 -DNDEBUG -fPIC -fvisibility=hidden -m64 -mavx2 -mfma -mavx512f -mavx512bw -mavx512dq -mavx512vl -std=c++14 -MD -MT third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o -MF third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o.d -o third_party/fbgemm/CMakeFiles/fbgemm_avx512.dir/src/FbgemmFloat16ConvertAvx512.cc.o -c /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/immintrin.h:49,
                 from /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc:7:
In function ‘__m512 _mm512_min_ps(__m512, __m512)’,
    inlined from ‘void fbgemm::{anonymous}::FloatToFloat16KernelAvx512WithClip(const float*, fbgemm::float16*)’ at /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc:29:31,
    inlined from ‘void fbgemm::FloatToFloat16_avx512(const float*, float16*, size_t, bool)’ at /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc:53:41:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/avx512fintrin.h:13149:10: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
13149 |   return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13150 |                                                 (__v16sf) __B,
      |                                                 ~~~~~~~~~~~~~~
13151 |                                                 (__v16sf)
      |                                                 ~~~~~~~~~
13152 |                                                 _mm512_undefined_ps (),
      |                                                 ~~~~~~~~~~~~~~~~~~~~~~~
13153 |                                                 (__mmask16) -1,
      |                                                 ~~~~~~~~~~~~~~~
13154 |                                                 _MM_FROUND_CUR_DIRECTION);
      |                                                 ~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/avx512fintrin.h: In function ‘void fbgemm::FloatToFloat16_avx512(const float*, float16*, size_t, bool)’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/avx512fintrin.h:188:10: note: ‘__Y’ was declared here
  188 |   __m512 __Y = __Y;
      |          ^~~
In function ‘__m512 _mm512_max_ps(__m512, __m512)’,
    inlined from ‘void fbgemm::{anonymous}::FloatToFloat16KernelAvx512WithClip(const float*, fbgemm::float16*)’ at /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc:29:31,
    inlined from ‘void fbgemm::FloatToFloat16_avx512(const float*, float16*, size_t, bool)’ at /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc:53:41:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/avx512fintrin.h:13033:10: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
13033 |   return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A,
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13034 |                                                 (__v16sf) __B,
      |                                                 ~~~~~~~~~~~~~~
13035 |                                                 (__v16sf)
      |                                                 ~~~~~~~~~
13036 |                                                 _mm512_undefined_ps (),
      |                                                 ~~~~~~~~~~~~~~~~~~~~~~~
13037 |                                                 (__mmask16) -1,
      |                                                 ~~~~~~~~~~~~~~~
13038 |                                                 _MM_FROUND_CUR_DIRECTION);
      |                                                 ~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/avx512fintrin.h: In function ‘void fbgemm::FloatToFloat16_avx512(const float*, float16*, size_t, bool)’:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/avx512fintrin.h:188:10: note: ‘__Y’ was declared here
  188 |   __m512 __Y = __Y;
      |          ^~~
In function ‘__m256i _mm512_cvtps_ph(__m512, int)’,
    inlined from ‘void fbgemm::{anonymous}::FloatToFloat16KernelAvx512WithClip(const float*, fbgemm::float16*)’ at /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc:32:40,
    inlined from ‘void fbgemm::FloatToFloat16_avx512(const float*, float16*, size_t, bool)’ at /home/jp/builds/stablediffusion/rocm-build/pytorch/third_party/fbgemm/src/FbgemmFloat16ConvertAvx512.cc:53:41:
/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/avx512fintrin.h:8598:53: error: ‘__Y’ may be used uninitialized [-Werror=maybe-uninitialized]
 8598 |   return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A,
      |                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
 8599 |                                                      __I,
      |                                                      ~~~~
 8600 |                                                      (__v16hi)
      |                                                      ~~~~~~~~~
 8601 |                                                      _mm256_undefined_si256 (),
      |                                                      ~~~~~~~~~~~~~~~~~~~~~~~~~~
 8602 |                                                      -1);
      |                                                      ~~~
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/immintrin.h:43:

jpentland commented 2 years ago

Update: I added USE_FBGEMM=OFF to disable fbgemm. I now build with this command line:

USE_FBGEMM=OFF USE_CUDA=OFF MAX_JOBS=4 USE_ROCM=1 USE_NINJA=1 python3 setup.py bdist_wheel

But I still get an error in the aten build:

In file included from /usr/include/magma_copy.h:12,
                 from /usr/include/magmablas.h:12,
                 from /usr/include/magma_v2.h:22,
                 from /home/jp/builds/stablediffusion/rocm-build/pytorch/aten/src/ATen/hip/detail/HIPHooks.cpp:26:
/usr/include/magma_types.h:71:14: fatal error: cuda.h: No such file or directory
   71 |     #include <cuda.h>    // for CUDA_VERSION
      |              ^~~~~~~~
compilation terminated.

xuhuisheng commented 2 years ago

It might cause by gcc-12, I haven't test pytorch with gcc-12. there is just gcc-11 on ubuntu-22.04.

xuhuisheng / rocm-build