Closed snackfart closed 3 years ago
The offcial release of pytorch-1.8.1 didn't support gfx803. We have to compile pytorch by ourselves. Right now, you can refer navi10 documents : https://github.com/xuhuisheng/rocm-build/tree/master/navi10
I think I can add pytorch building script for gfx803, later.
Which combination of rocm and pytorch does work with a 480 officially?
@snackfart Unfortunately, The pytorch-1.8.0 is the first offical release (event beta) version on ROCm.
The only way to run pytorch on gfx803 is compiling by ourselves.
I added scripts for building pytorch from sources. https://github.com/xuhuisheng/rocm-build/tree/master/gfx803#pytorch-181-crashed-on-gfx803
I added scripts for building pytorch from sources. https://github.com/xuhuisheng/rocm-build/tree/master/gfx803#pytorch-181-crashed-on-gfx803
many thanks, which version of rocm is preferable for this. 3.5 or 4.1.1?
I am using ROCm-4.1.1, now. I am just run pytorch and tensorflow with ROCm-4.1.1 on some small model, like mnist. Didn't persuade my colleagues to use ROCm on bigger environment, yet.
Building pytorch costs lots of times. Maybe I can try building pytorch on ROCm-3.5.1later.
I am using ROCm-4.1.1, now. I am just run pytorch and tensorflow with ROCm-4.1.1 on some small model, like mnist. Didn't persuade my colleagues to use ROCm on bigger environment, yet.
Building pytorch costs lots of times. Maybe I can try building pytorch on ROCm-3.5.1later.
Okay, thanks. Can you upload your build of pyTorch for the 803?
I am using ROCm-4.1.1, now. I am just run pytorch and tensorflow with ROCm-4.1.1 on some small model, like mnist. Didn't persuade my colleagues to use ROCm on bigger environment, yet. Building pytorch costs lots of times. Maybe I can try building pytorch on ROCm-3.5.1later.
Okay, thanks. Can you upload your build of pyTorch for the 803?
the build process fails at:
USE_ROCM=1 USE_NINJA=1 python3 setup.py bdist_wheel
```
Building wheel torch-1.8.0a0+56b43f4
-- Building version 1.8.0a0+56b43f4
cmake -GNinja -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/buran/pytorch/torch -DCMAKE_PREFIX_PATH=/usr/lib/python3/dist-packages -DNUMPY_INCLUDE_DIR=/home/buran/.local/lib/python3.8/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/usr/bin/python3 -DPYTHON_INCLUDE_DIR=/usr/include/python3.8 -DPYTHON_LIBRARY=/usr/lib/libpython3.8.so.1.0 -DTORCH_BUILD_VERSION=1.8.0a0+56b43f4 -DUSE_NINJA=1 -DUSE_NUMPY=True -DUSE_ROCM=1 /home/buran/pytorch
-- std::exception_ptr is supported.
-- Turning off deprecation warning due to glog.
-- Current compiler supports avx2 extension. Will build perfkernels.
-- Current compiler supports avx512f extension. Will build fbgemm.
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
--
-- 3.11.4.0
-- Caffe2 protobuf include directory: $
My pytorch package is building on python-3.8, you are using python-3.7, not compitable.
And before compiling pytorch, you have to install rocm-dkms sudo apt install rocm-dkms rocm-libs
Upload torch, torchvision, rocblas, rocrand to baidu cloud disk, please have a try.
url https://pan.baidu.com/s/1zV5j9RPehMvKjqIFaHs0jw
code 5jw8
OS | Python | ROCm | GPU |
---|---|---|---|
Ubuntu-20.04.2 | 3.8 | 4.1.1 | RX580 |
Upload torch, torchvision, rocblas, rocrand to baidu cloud disk, please have a try.
url https://pan.baidu.com/s/1zV5j9RPehMvKjqIFaHs0jw code
5jw8
OS Python ROCm GPU Ubuntu-20.04.2 3.8 4.1.1 RX580
can you upload your files somewhere else, i have to download a baidu program to download your files. e.g. https://easyupload.io/
I find I cannot access easyupload or google driver or dropbox. :cry:
I find I cannot access easyupload or google driver or dropbox. 😢
or upload your files in this repo, e.g. under a folder like builds
I find I cannot access easyupload or google driver or dropbox. 😢
or upload your files in this repo, e.g. under a folder like builds
or when your files are <25MB you can upload the files with your comment in github
@snackfart Try this https://github.com/xuhuisheng/rocm-gfx803
@snackfart Try this https://github.com/xuhuisheng/rocm-gfx803
works thx.
@snackfart Try this https://github.com/xuhuisheng/rocm-gfx803
works thx.
Cannot open google driver. :sob:
And I moved archieves from git to release page. Feel better now. https://github.com/xuhuisheng/rocm-gfx803
Cannot open google driver. ðŸ˜
And I moved archieves from git to release page. Feel better now. https://github.com/xuhuisheng/rocm-gfx803
very nice, thanks again
@xuhuisheng can you explain it behavior? Maybe i have reinstall os, rcm and pytorch to get it working correctly
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch as tp
>>> tp.add(1,2)
tensor(3)
>>> exit()
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)
Add AMD_LOG_LEVEL=6
can show debug log, like this:
AMD_LOG_LEVEL=6 python3 main.py
:1:hip_code_object.cpp :451 : 3970231024154 us: hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
:1:hip_code_object.cpp :453 : 3970231024169 us: Devices:
:1:hip_code_object.cpp :455 : 3970231024175 us: amdgcn-amd-amdhsa--gfx803 - [Not Found]
:1:hip_code_object.cpp :460 : 3970231024180 us: Bundled Code Objects:
:1:hip_code_object.cpp :477 : 3970231024185 us: host-x86_64-unknown-linux - [Unsupported]
:1:hip_code_object.cpp :474 : 3970231024195 us: hipv4-amdgcn-amd-amdhsa--gfx803:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx803:xnack-]
/home/work/ROCm/HIP/rocclr/hip_code_object.cpp:481: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Make sure rocrand and pytorch had been overwritten. like this:
sudo apt install rocm-dkms rocm-libs
sudo dpkg -i rocblas_2.36.0-93c82939_amd64.deb
sudo dpkg -i rocrand_2.10.7-c73b16d_amd64.deb
pip3 install torch-1.8.0a0+56b43f4-cp38-cp38-linux_x86_64.whl
pip3 install torchvision-0.9.0a0+8fb5838-cp38-cp38-linux_x86_64.whl
HaHa~ I test pytorch-1.7.0 on ROCm-3.5.1 and gfx803. The mnist can run properly. https://github.com/xuhuisheng/rocm-gfx803
Add
AMD_LOG_LEVEL=6
can show debug log, like this:
AMD_LOG_LEVEL=6 python3 main.py
:1:hip_code_object.cpp :451 : 3970231024154 us: hipErrorNoBinaryForGpu: Unable to find code object for all current devices! :1:hip_code_object.cpp :453 : 3970231024169 us: Devices: :1:hip_code_object.cpp :455 : 3970231024175 us: amdgcn-amd-amdhsa--gfx803 - [Not Found] :1:hip_code_object.cpp :460 : 3970231024180 us: Bundled Code Objects: :1:hip_code_object.cpp :477 : 3970231024185 us: host-x86_64-unknown-linux - [Unsupported] :1:hip_code_object.cpp :474 : 3970231024195 us: hipv4-amdgcn-amd-amdhsa--gfx803:xnack- - [code object v4 is amdgcn-amd-amdhsa--gfx803:xnack-] /home/work/ROCm/HIP/rocclr/hip_code_object.cpp:481: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Make sure rocrand and pytorch had been overwritten. like this:
sudo apt install rocm-dkms rocm-libs
sudo dpkg -i rocblas_2.36.0-93c82939_amd64.deb
sudo dpkg -i rocrand_2.10.7-c73b16d_amd64.deb
pip3 install torch-1.8.0a0+56b43f4-cp38-cp38-linux_x86_64.whl
pip3 install torchvision-0.9.0a0+8fb5838-cp38-cp38-linux_x86_64.whl
after installing all your debs it works i guess, no more code object error
i cant test it today, but i think it should now. the only error i get in my bigger project is this:
2021-04-15 10:08:11.231034: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-15 10:08:11.231051: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
but i guess this should normal.
how can i specify an amd gpu in torch? or will ROCm replace "cpu" with the corresponding amd device?
self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
Make sure installing tensorflow-rocm, not tensorflow. Looks like your tensorflow still use cuda.
Yes, Feel free to use cuda besiding cpu in pytorch. Here is my test scripts for pytorch. https://github.com/xuhuisheng/rocm-build/blob/master/check/test-pytorch-device.py
And I uploaded pytorch-1.7.0 to https://github.com/xuhuisheng/rocm-gfx803, if you are interesting , please have a try.
Make sure installing tensorflow-rocm, not tensorflow. Looks like your tensorflow still use cuda.
will do.
Yes, Feel free to use cuda besiding cpu in pytorch.
what is the syntax to specify an amd gpu in torch.device(), not a cuda gpu or a cpu?
Here is my test scripts for pytorch. https://github.com/xuhuisheng/rocm-build/blob/master/check/test-pytorch-device.py
will do, but tomorrow i guess
And I uploaded pytorch-1.7.0 to https://github.com/xuhuisheng/rocm-gfx803, if you are interesting , please have a try.
you are a machine, my dude
Emm~, I mean just using device = torch.device("cuda")
.
ROCm aims to totally replace cuda, so the business codes shouldn't need change. I guess mostly codes can run directly.
https://github.com/xuhuisheng/rocm-build/blob/master/check/test-pytorch-fc.py#L22
okay thx, this was a big mystery for me, but the drop in replacement makes sense
Environment
What is the expected behavior
The given fix didnt fixed the problem: https://github.com/RadeonOpenCompute/ROCm/issues/1454 Maybe i didnt sth wrong?
What actually happens
-
How to reproduce
-