xuhuisheng / rocm-build

build scripts for ROCm
Apache License 2.0
181 stars 35 forks source link

tensorflow-rocm needs hipErrorNoBinaryForGpu (does it need recompile?) #13

Closed ianferreira closed 3 years ago

ianferreira commented 3 years ago

Environment

Hardware description
GPU - Navi21 RX6800
CPU - AMD
Software version
OS - 20.04.2
ROCm - 4.2
Python -

What is the expected behavior

tensorflow-rocm works after completing the install in this repo

What actually happens

ian@xxxx~/Documents/Src$ /home/ian/.envs/py3tf2/bin/python /home/ian/Documents/Src/test.py 2021-05-27 12:35:24.176982: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-05-27 12:35:24.177308: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libamdhip64.so /home/ian/Documents/rocm-build/ROCm/HIP/rocclr/hip_code_object.cpp:486: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!" Aborted (core dumped)

How to reproduce

pip install tensorflow-rocm ... ... Successfully installed tensorflow-rocm-2.4.3

python

import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout

model = Sequential()

xuhuisheng commented 3 years ago

Yes, you need recompile tensorflow with gfx1030, but itis no guanrontee that we can run ROCm on gfx1030 properly.

Good news is ROCm said they will support navi in 2021. So my suggestion is waiting for official support.

xuhuisheng commented 3 years ago

I guess gfx1030 will be supported on ROCm-4.3. https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/release/rocm-rel-4.3/CMakeLists.txt#L169

ROCm-4.3 may be released about June or July 2021, I wish.

ianferreira commented 3 years ago

Indeed. Thanks for this project.

Flock1 commented 2 years ago

Hi.

I have ROCm 4.3 with gfx1030. I installed tensorflow-rocm and when it imports fine. But when I run to check if it's using GPU, then I get this:

tf.config.list_physical_devices('GPU')

2021-09-11 09:15:31.830208: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libamdhip64.so
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Aborted (core dumped)

Any suggestions?

ianferreira commented 2 years ago

Sadly ROCM still doesnt supper Navi. I was also hoping 4.3 would. Some of the libraries did add support.

Flock1 commented 2 years ago

Which deep learning frameworks can I use then?

xuhuisheng commented 2 years ago

@Flock1 The tensorflow cost more time than pytorch. If I have a gfx1030, I will recompile pytorch to test.

Here is some scipts for gfx1030: https://github.com/xuhuisheng/rocm-build/tree/master/navi21