Closed riaqn closed 2 years ago
I am not aware that tensorflow-rocm-2.5.0 had been released. I will test it and verify if it support gfx803.
And I have seen you used linux kernel-5.13 and python-3.9. You didn't test under ubuntu-20.04. right?
update
Verified tensorflow-rocm-2.5.0 didn't support gfx803, now! Sad news.
The workaround is using tensorflow-rocm-2.4.3. pip3 install tensorflow-rocm==2.4.3
.
I will try to find a way to recompile tensorflow-rocm-2.5.0.
Thanks for the quick reply! I used Arch Linux - does the linux distribution matter?
Unfortunately, only 2.5.0 is available from pypi as binary packages.
I'm now trying to recompile tensorflow-rocm 2.5.0 using this AUR building script, which supports gfx803. https://github.com/rocm-arch/tensorflow-rocm
but encountered some issue: https://github.com/rocm-arch/tensorflow-rocm/issues/31
I recompiled https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/tree/r2.5-rocm-enhanced , and didn't meet your compiling error.
The mnist run properly.
I guess it may be caused by gcc-10, The gcc used in ubuntu-20.04.2 is gcc-9.
BTW, I just install bazel-3.7.2 and execute build_rocm_python3
then wait about 3 hours, the tensorflow-2.5.0-cp38-cp38-linux_x86_64.whl
had been built successly. I used ubuntu:20.04 image of docker, just remember install depends package.
Now I cannot make sure tensorflow-rocm used local gpu config or we need do some config likes AMDGPU_TARGETS=gfx803
I'm actually using gcc-11. Let me try gcc-9.
Seems only tensorflow-rocm-2.5.0 provided python39 whl. Maybe you can try python3.8 with tensorflow-rocm-2.4.3 https://pypi.org/project/tensorflow-rocm/2.4.3/#files
OK, after some research, the problem is that TF-2.5.0 referencing an outdated version of ruy. The issue in ruy is fixed in later commit. TF-2.6.0 references a later version of ruy which is fine.
It seems that TF doesn't backport fixes, meaning we can only wait for tensorflow-rocm 2.6.0, or make a patch according to this issue, or use GCC-10 to compile (but I'm having some problem with this too)
@xuhuisheng I just realize that you mentioned to patch rocblas (removing library/src/blas3/Tensile/Logic/asm_full/r9nano_*.yaml
). Do we still need to do that for rocm4.3.0?
@riaqn Please see the document for what this patch doing. https://github.com/xuhuisheng/rocm-build/tree/master/gfx803#rocm-37-broke-on-gfx803
3 months after last posts, I will close this issue, please reopen if there is any updates.
Environment
What is the expected behavior
-Tensorflow should run correctly.
What actually happens
How to reproduce
Do I have to recompile tensorflow?