Closed grinness closed 1 year ago
This error I am sure will be a bit of a hunt to track down. Since this is a runtime error and not a build time error, there is not much we can do in terms of packaging besides adding a patch file to which package that needs it. What the patch is and where it should go, the ROCm team probably knows best. I would file this issue over at the miopen repo https://github.com/ROCmSoftwarePlatform/MIOpen and see if they can spot something.
Hi acxz,
thanks for the feedback. I have already submitted an issue in the main (ROCM MIOpen) repo (see my comment above, issue #1764
Just a tip, you might want to make the title more concise and format the issue. As it is, it looks like an info dump, which could decrease the chances that someone spends the time to read it and eventually solve it. Presentation matters.
one of option was to install miopen-opencl
but when i've tried to install python-pytorch-opt-rocm
yay gave me dependency conflict
:: There are 2 providers available for miopen:
:: Repository AUR
1) miopen-hip 2) **miopen-opencl**
Enter a number (default=1):
==> 2
:: There are 2 providers available for **miopen-hip**:
:: Repository AUR
1) miopen-hip 2) opencl-amd-dev
Enter a number (default=1):
==> 1
:: Checking for inner conflicts...
-> Inner conflicts found:
-> miopen-hip: miopen-opencl
-> miopen-opencl: miopen-hip
-> Conflicting packages will have to be confirmed manually
is it possible to compile it with only miopen-opencl?
p.s.
:: There are 2 providers available for rocm-hip-sdk:
:: Repository AUR
1) rocm-hip-sdk 2) opencl-amd-dev
also rocm-hip-sdk is needed, which requires miopen-hip
^_^
Hi,
i have the same issue in testing with miopen-opencl, aka a packaging and dependency issue:
: miopen-opencl and miopen-hip are in conflict (miopen). Remove miopen-hip? [y/N] y
error: failed to prepare transaction (could not satisfy dependencies)
:: removing miopen-hip breaks dependency 'miopen-hip' required by rocm-hip-sdk
-> exit status 1
Trying to remove rocm-hip-sdk causes more problems with the package python-pytorch-rocm(which is hosted on the rocm-arch repos AFAIK:
yay -R rocm-hip-sdk
checking dependencies...
error: failed to prepare transaction (could not satisfy dependencies)
:: removing rocm-hip-sdk breaks dependency 'rocm-hip-sdk' required by hipmagma
:: removing rocm-hip-sdk breaks dependency 'rocm-hip-sdk' required by python-pytorch-rocm
-> exit status 1
Hi all,
for the dependencies of the meta packages we follow upstream, see the documentation. pytorch only requires the dependencies of the meta package and not the package itself (which is a single info file only), so you can savely remove the meta package with pacman -Rd
and install miopen-opencl
. If your AUR helper causes trouble, download the PKGBUILD from the AUR (or from this repo) and build the package by calling makepkg -ci
.
@tpkessler
thanks for the info, however we still have a vicious dependency cycle
After removing miopen-hip (together with python-pythorch-rocm, torchvision-rocm, hipmagma and rocm-hip-sdk), and installing miopen-opencl, I cannot re-install python-pythorch-rocm without re-installing miopen-hip :
> yay -S hipmagma
:: Checking for conflicts...
:: Checking for inner conflicts...
-> Package conflicts found:
-> Installing miopen-hip will remove: miopen-opencl (miopen)
Yes, I can download hipmagma PKGBUILD an remove dependency on rocm-hip-sdk manually -- but something needs to be fixed on the rocm-arch repo for end users
I will try the above and see if I find anything else
Thanks
Hi all,
hipmagma currenlty does not build due it requiring fmt version 8 (recently arch has update to fmt 9):
FAILED: testing/testing_zaxpy
: && /opt/rocm/bin/hipcc -march=native -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -fcf-protection=none -std=c++11 -fopenmp=libomp -Wall -Wno-unused-function -O3 -DNDEBUG -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -rdynamic CMakeFiles/testing_zaxpy.dir/testing/testing_zaxpy.cpp.o -o testing/testing_zaxpy -Wl,-rpath,/opt/hd02/projects/pkgbuild/hipmagma/src/magma-2.6.2/build/lib:/opt/intel/oneapi/mkl/latest/lib/intel64 lib/libtester.so lib/liblapacktest.so lib/libmagma.so /opt/rocm/lib/libamdhip64.so.5.2.21153-881bc1d -Wl,-Bstatic -lclang_rt.builtins-x86_64 -Wl,-Bdynamic /opt/rocm/lib/libhipblas.so.0.1 /opt/rocm/lib/libhipsparse.so.0.1 /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_gf_lp64.so /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_gnu_thread.so /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_core.so -lgomp -lm -ldl -lm -ldl && :
/usr/bin/ld: warning: libfmt.so.8, needed by /opt/rocm/lib/librocsolver.so.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `int fmt::v8::detail::snprintf_float<long double>(long double, int, fmt::v8::detail::float_specs, fmt::v8::detail::buffer<char>&)'
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `fmt::v8::detail::dragonbox::decimal_fp<double> fmt::v8::detail::dragonbox::to_decimal<double>(double)'
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `int fmt::v8::detail::format_float<double>(double, int, fmt::v8::detail::float_specs, fmt::v8::detail::buffer<char>&)'
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `fmt::v8::detail::error_handler::on_error(char const*)'
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `int fmt::v8::detail::format_float<long double>(long double, int, fmt::v8::detail::float_specs, fmt::v8::detail::buffer<char>&)'
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `int fmt::v8::detail::snprintf_float<double>(double, int, fmt::v8::detail::float_specs, fmt::v8::detail::buffer<char>&)'
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `fmt::v8::vformat[abi:cxx11](fmt::v8::basic_string_view<char>, fmt::v8::basic_format_args<fmt::v8::basic_format_context<fmt::v8::appender, char> >)'
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `char fmt::v8::detail::decimal_point_impl<char>(fmt::v8::detail::locale_ref)'
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `fmt::v8::detail::thousands_sep_result<char> fmt::v8::detail::thousands_sep_impl<char>(fmt::v8::detail::locale_ref)'
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `fmt::v8::detail::dragonbox::decimal_fp<float> fmt::v8::detail::dragonbox::to_decimal<float>(float)'
/usr/bin/ld: /opt/rocm/lib/librocsolver.so.0: undefined reference to `fmt::v8::detail::throw_format_error(char const*)'
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)
Assuming that I can resolve the fmt8 issue I tried to re-compile rocsolver -- and now hitting the bug below:
https://github.com/rocm-arch/rocm-arch/issues/850
... down the rabbit hole ...
Hi,
I solved the issue with rocsolver as suggested in the thread of #850, compiled hipmagma successfully, forced the installation of python-pytorch-rocm (that requires rocm-hip-sdk), installed python-torchvision-rocm ... and now even the feedforward neural networks from pytorch do not work:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In [3], line 1
----> 1 import torch
2 import torch.nn as nn
3 import torchvision
File /usr/lib/python3.10/site-packages/torch/__init__.py:202
200 if USE_GLOBAL_DEPS:
201 _load_global_deps()
--> 202 from torch._C import * # noqa: F403
204 # Appease the type checker; ordinarily this binding is inserted by the
205 # torch._C module initialization code in C
206 if TYPE_CHECKING:
ImportError: /opt/rocm/lib/libMIOpen.so.1: version `MIOPEN_HIP_1' not found (required by /usr/lib/python3.10/site-packages/torch/lib/libtorch_python.so)
... I assume I have to re-build python-pytorch-rocm from source with miopen-opencl to see if anything works ....
Hi,
tested re-compiling python-pytorch-rocm having miopen-opencl instead of miopen-hip installed. The build breaks in multiple places ('no matching function' or 'no known conversion') -- not sure where to go from here
In file included from /opt/hd02/projects/pkgbuild/python-pytorch-rocm.git/src/pytorch-1.12.1-rocm/caffe2/operators/hip/accuracy_op.hip:3:
/opt/hd02/projects/pkgbuild/python-pytorch-rocm.git/src/pytorch-1.12.1-rocm/caffe2/core/hip/context_gpu.h:137:22: error: no matching function for call to 'miopenSetStream'
MIOPEN_ENFORCE(miopenSetStream(r, hip_stream));
^~~~~~~~~~~~~~~
/opt/hd02/projects/pkgbuild/python-pytorch-rocm.git/src/pytorch-1.12.1-rocm/caffe2/core/hip/common_miopen.h:58:33: note: expanded from macro 'MIOPEN_ENFORCE'
miopenStatus_t status = condition; \
^~~~~~~~~
/opt/rocm/include/miopen/miopen.h:194:30: note: candidate function not viable: no known conversion from 'c10::hip::HIPStream' to 'miopenAcceleratorQueue_t' (aka '_cl_command_queue *') for 2nd argument
MIOPEN_EXPORT miopenStatus_t miopenSetStream(miopenHandle_t handle,
^
26 warnings and 1 error generated when compiling for gfx1030.
CMake Error at torch_hip_generated_accuracy_op.hip.o.cmake:200 (message):
Error generating file
/opt/hd02/projects/pkgbuild/python-pytorch-rocm.git/src/pytorch-1.12.1-rocm/build/caffe2/CMakeFiles/torch_hip.dir/operators/hip/./torch_hip_generated_accuracy_op.hip.o
Hi,
as flagged upstream, I tested using docker image provided in :
https://hub.docker.com/r/rocm/pytorch/#!
the latest version of the docker image supports gfx 1030 (rocm is version 5.2.0) Convolutional neural-networs and feeed-forward neural network work fine (pytorch/torchvision) -- Note that I use standard Arch Zen Linux kernel with the docker image, not the rocm version and no issues.
It must be something with the rocm-arch packages -- I noticed that pytorch rocm in the arch repos requires hipmagma; when I compiled (on archlinux) pythorch rocm from sources for my old gpu (rx480, rocm 4.x) i did not use that (possibly not related)
Please see: https://github.com/ROCmSoftwarePlatform/MIOpen/issues/1764
Hi all,
the issue upstream has been closed as ''I agree it's most likely a configuration problem, you can close this issue :)" The issue does not manifest with docker image (also tested latest version with ROCM 5.2.3 + pytorch + torchvision), running with the very same arch kernel as used with rocm-arch packages
I will re-test with ROCM 5.3 once available
Thanks
i've updated to comgr 5.3.0
rocm-cmake 5.3.0
rocm-llvm 5.3.0
rocm-device-libs 5.3.0-1
hsa-rocr 5.3.0
and error is gone now
Hi, I don't know if it's ok to use this issue but comgr
build is failing for me:
/tmp/ram_drive/makepkg/comgr/src/ROCm-CompilerSupport-rocm-5.3.0/lib/comgr/src/comgr-device-libs.cpp:184:33: error: 'oclc_abi_version_500_lib' was not declared in this scope; did you mean 'oclc_isa_version_900_lib'? 184 | oclc_abi_version_500_lib, | ^
~~~~~~~ | oclc_isa_version_900_lib /tmp/ram_drive/makepkg/comgr/src/ROCm-CompilerSupport-rocm-5.3.0/lib/comgr/src/comgr-device-libs.cpp:185:33: error: 'oclc_abi_version_500_lib_size' was not declared in this scope; did you mean 'oclc_isa_version_900_lib_size'? 185 | oclc_abi_version_500_lib_size)) { | ^~~~~~~~~ | oclc_isa_version_900_lib_size /tmp/ram_drive/makepkg/comgr/src/ROCm-CompilerSupport-rocm-5.3.0/lib/comgr/src/comgr-device-libs.cpp:192:33: error: 'oclc_abi_version_400_lib' was not declared in this scope; did you mean 'oclc_isa_version_900_lib'? 192 | oclc_abi_version_400_lib, | ^~~~~~~~ | oclc_isa_version_900_lib /tmp/ram_drive/makepkg/comgr/src/ROCm-CompilerSupport-rocm-5.3.0/lib/comgr/src/comgr-device-libs.cpp:193:33: error: 'oclc_abi_version_400_lib_size' was not declared in this scope; did you mean 'oclc_isa_version_900_lib_size'? 193 | oclc_abi_version_400_lib_size)) { | ^~~~~~~~~ | oclc_isa_version_900_lib_size /tmp/ram_drive/makepkg/comgr/src/ROCm-CompilerSupport-rocm-5.3.0/lib/comgr/src/comgr-device-libs.cpp:201:33: error: 'oclc_abi_version_400_lib' was not declared in this scope; did you mean 'oclc_isa_version_900_lib'? 201 | oclc_abi_version_400_lib, | ^~~~~~~~ | oclc_isa_version_900_lib /tmp/ram_drive/makepkg/comgr/src/ROCm-CompilerSupport-rocm-5.3.0/lib/comgr/src/comgr-device-libs.cpp:202:33: error: 'oclc_abi_version_400_lib_size' was not declared in this scope; did you mean 'oclc_isa_version_900_lib_size'? 202 | oclc_abi_version_400_lib_size)) { | ^~~~~~~~~ | oclc_isa_version_900_lib_size make[2]: [CMakeFiles/amd_comgr.dir/build.make:104: CMakeFiles/amd_comgr.dir/src/comgr-device-libs.cpp.o] Error 1 make[1]: [CMakeFiles/Makefile2:311: CMakeFiles/amd_comgr.dir/all] Error 2 make: *** [Makefile:166: all] Error 2 ==> ERROR: A failure occurred in build().
I'm on kernel 6.0.1, rocm-llvm 5.3.0-1, rocm-cmake, opencl-amd 21.50.50000.1376259-3.
@fractal-fumbler
can you try to replace amd-opencl with rocm-opencl-runtime (and possibly bring in rocm-opencl-sdk too)?
I am not sure that amd-opencl is compatible with rocm
It may be worth a try
Thanks
Please open a separate issue on this @fractal-fumbler.
@fractal-fumbler can you try to replace amd-opencl with rocm-opencl-runtime (and possibly bring in rocm-opencl-sdk too)?
I am not sure that amd-opencl is compatible with rocm
i think you've meant @DistantThunder, because i've had rocm-opencl-runtime
and rocm-opencl-sdk
installed from the very begging
Yes, I did. @DistantThunder should open a separate issue on this.
Closed due to inactivity.
Hi,
I am not sure where to report the error, aka:
I am testing few ML algorithms from the torchvision packages on the MINST dataset. Standard feed-forward neural nets work fine, but convolutional neural networks do not (naive_conv.cpp) -- the same code used to work fine on a rx480 with rocm 4.x
Python environment (via Jupyter-Lab) reports the following errors:
Error
``` MIOpen(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_COMPILE_SOURCE_TO_BC: ERROR (1) MIOpen(HIP): Error [BuildHip] comgr status = ERROR (1) MIOpen(HIP): Warning [BuildHip] In file included from /tmp/comgr-a8ec8e/input/naive_conv.cpp:1: In file included from /tmp/hip_pch.115095/hip_pch.h:1: In file included from /home/marco/.cache/yay/hip-runtime-amd/src/HIP-rocm-5.2.3/include/hip/hip_runtime.h:54: In file included from /usr/lib64/cmake/llvm/../../../bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/thread:44: In file included from /usr/lib64/cmake/llvm/../../../bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/bits/this_thread_sleep.h:36: /usr/lib64/cmake/llvm/../../../bin/../lib64/gcc/x86_64-pc-linux-gnu/12.2.0/../../../../include/c++/12.2.0/bits/chrono.h:650:36: error: no matching conversion for functional-style cast from 'const duration