xuhuisheng / rocm-gfx803

185 stars 9 forks source link

Pytorch binaries not working on arch4edu ROCm #26

Closed FranGamer1892 closed 1 year ago

FranGamer1892 commented 1 year ago

Hello, I installed the ROCm stack from arch4edu and it seems to be working (rocminfo detects my RX 580). However, upon installing and testing torch (installed from the wheels provided here), this error pops up.

Traceback (most recent call last):
  File "pytest.py", line 4, in <module>
    import torch
  File "/home/fran/.local/lib/python3.8/site-packages/torch/__init__.py", line 199, in <module>
    from torch._C import *  # noqa: F403
ImportError: /home/fran/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: zgetrs_

I tried building torch myself but it didn't go so well, haha I attempted to follow this but for instance, I don't seem to find the Arch equivalent of the packages installed by apt. Proceeding to build pytorch results in a bunch of errors, I couldn't really distinguish what the problem was.

xuhuisheng commented 1 year ago

Which version of ROCm do you use? I didnot use arch before, But I can test related ROCm version with pytorch on this weekend.

The log said cannot find zgetr_ function, maybe caused by uncompatable api.

FranGamer1892 commented 1 year ago

I am not sure where to check, so I'll just send you the output from various commands, sorry haha

Thanks!

https://paste.debian.net/1269416/ https://paste.debian.net/1269417/ https://paste.debian.net/1269418/ https://paste.debian.net/1269419/

xuhuisheng commented 1 year ago

Looks like Versión : 5.4.0-1.

I will have a try this weekend.

FranGamer1892 commented 1 year ago

Thank you!

El jue., 2 de febrero de 2023 03:27, Xu Huisheng @.***> escribió:

Looks like Versión : 5.4.0-1.

I will have a try this weekend.

— Reply to this email directly, view it on GitHub https://github.com/xuhuisheng/rocm-gfx803/issues/26#issuecomment-1413217140, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEKT6HSUXM3NQCE2CAINCDWVNHUFANCNFSM6AAAAAAUOJVWLU . You are receiving this because you authored the thread.Message ID: @.***>

FranGamer1892 commented 1 year ago

Hello, I got around to building torch by myself and when I try testing it, this error pops up:

/usr/include/c++/12.2.0/bits/stl_vector.h:1123: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = const char*; _Alloc = std::allocator<const char*>; reference = const char*&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
Abortado

Same error I had with the python-pytorch-rocm package from my distro's repositories. Unfortunately I can't find anything online. I built torch from release/1.12 using python 3.8. I had to set BUILD_TEST=OFF otherwise I couldn't build, and I had to change many things on the source code since my GCC version is too new, for instance.

PS: This is the script I'm testing torch with, I think I got it from you haha

FranGamer1892 commented 1 year ago

Closing this as now there is a gfx803-compatible pytorch package in Arch repos, last time I checked it was on [community-testing]. Using a pytorch package built from source is also possible, but it's trickier to accomplish because of gcc/g++ 12 and other Arch-specific issues.