xuhuisheng / rocm-gfx803

185 stars 9 forks source link

How to build patched tensorflow package #18

Open TheJKM opened 1 year ago

TheJKM commented 1 year ago

Environment

Hardware description
GPU RX 570
CPU Ryzen 5 2600
Software version
OS Ubuntu 20.04.5
ROCm 5.3.0 gfx803 (from this repo)
Python 3.8

Hi, for my application I need tensorflow 2.7, so I'd like to build it. From the available resources it is not clear to me how the provided tensorflow package is patched or if it is even patched at all to run on gfx803. Could you provide an insight on how you build the tensorflow package please?

xuhuisheng commented 1 year ago

First install related bazel. The related version of tensorflow requried the related version of bazel. tensorflow-2.7 should depends bazel-3.7.2 https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/r2.7-rocm-enhanced/.bazelversion

Second step, clone tensorflow-upstream repository and checkout related branch.

git clone https://github.com/ROCmSoftwareplatform/tensorflow-upstream.git
git checkout r2.7-rocm-enhanced

export ROCM_PATH=/opt/rocm
export HIP_PATH=$ROCM_PATH/hip
export PATH=$HIP_PATH/bin:$PATH
export ROCM_TOOLKIT_PATH=$ROCM_PATH

sudo apt-get update && sudo apt-get install -y \
    python3-numpy \
    python3-dev \
    python3-wheel \
    python3-mock \
    python3-future \
    python3-pip \
    python3-yaml \
    python3-setuptools && \
    sudo apt-get clean

# sudo apt install -y python-is-python3 curl git

sudo apt install -y curl git

echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install -y openjdk-8-jdk openjdk-8-jre unzip && sudo apt-get clean && sudo rm -rf /var/lib/apt/lists/*

bash build_rocm_python3

Right now, build_rocm_python script used /opt/rocm-5.2.0, you can change this to the version you used. My environment is ubuntu-20.04.5. You can have a try.

TheJKM commented 1 year ago

Thank you very much for your tutorial! So, am I concluding right that tensorflow just has to be self-compiled to have gfx803 enabled, and no patching is required?

Unfortunately, your commands are not working on my Ubuntu 20.04.5 machine. It might be because you are checking out r2.9-rocm-enhanced? I used r2.7-rocm-enhanced as I need 2.7. The step bash build_rocm_python3 fails with the following output, do you have an idea? Thank you in advance!

johannes@JKMs-ROCm:~/tensorflow-upstream$ bash build_rocm_python3                     
You have bazel 3.7.2 installed.
Found possible Python library paths:
  /usr/lib/python3/dist-packages
  /usr/local/lib/python3.8/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]
Do you wish to build TensorFlow with CUDA support? [y/N]: No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: Clang will not be downloaded.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]: 

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=mkl_aarch64    # Build with oneDNN and Compute Library for the Arm Architecture (ACL).
        --config=monolithic     # Config for mostly static monolithic build.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v1             # Build with TensorFlow 1 API instead of TF 2 API.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=nogcp          # Disable GCP support.
        --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished
WARNING: The following configs were expanded more than once: [rocm]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=179
INFO: Reading rc options for 'build' from /home/johannes/tensorflow-upstream/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/johannes/tensorflow-upstream/.bazelrc:
  'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true
INFO: Reading rc options for 'build' from /home/johannes/tensorflow-upstream/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python3 --action_env PYTHON_LIB_PATH=/usr/lib/python3/dist-packages --python_path=/usr/bin/python3 --config=rocm --action_env ROCM_PATH=/opt/rocm-5.3.0
INFO: Reading rc options for 'build' from /home/johannes/tensorflow-upstream/.bazelrc:
  'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /home/johannes/tensorflow-upstream/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/johannes/tensorflow-upstream/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:rocm in file /home/johannes/tensorflow-upstream/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm_hipcc=true --define=tensorflow_mkldnn_contraction_kernel=0 --repo_env TF_NEED_ROCM=1
INFO: Found applicable config definition build:opt in file /home/johannes/tensorflow-upstream/.tf_configure.bazelrc: --copt=-Wno-sign-compare --host_copt=-Wno-sign-compare
INFO: Found applicable config definition build:rocm in file /home/johannes/tensorflow-upstream/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm_hipcc=true --define=tensorflow_mkldnn_contraction_kernel=0 --repo_env TF_NEED_ROCM=1
INFO: Found applicable config definition build:linux in file /home/johannes/tensorflow-upstream/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /home/johannes/tensorflow-upstream/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
INFO: Repository local_config_rocm instantiated at:
  /home/johannes/tensorflow-upstream/WORKSPACE:15:14: in <toplevel>
  /home/johannes/tensorflow-upstream/tensorflow/workspace2.bzl:1079:19: in workspace
  /home/johannes/tensorflow-upstream/tensorflow/workspace2.bzl:100:19: in _tf_toolchains
Repository rule rocm_configure defined at:
  /home/johannes/tensorflow-upstream/third_party/gpus/rocm_configure.bzl:831:33: in <toplevel>
WARNING: Download from https://storage.googleapis.com/mirror.tensorflow.org/github.com/llvm/llvm-project/archive/43d6991c2a4cc2ac374e68c029634f2b59ffdfdf.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
WARNING: Download from http://mirror.tensorflow.org/github.com/tensorflow/runtime/archive/64c92c8013b557087351c91b5423b6046d10f206.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException GET returned 404 Not Found
ERROR: An error occurred during the fetch of repository 'local_config_rocm':
   Traceback (most recent call last):
        File "/home/johannes/tensorflow-upstream/third_party/gpus/rocm_configure.bzl", line 812, column 38, in _rocm_autoconf_impl
                _create_local_rocm_repository(repository_ctx)
        File "/home/johannes/tensorflow-upstream/third_party/gpus/rocm_configure.bzl", line 551, column 35, in _create_local_rocm_repository
                rocm_config = _get_rocm_config(repository_ctx, bash_bin, find_rocm_config_script)
        File "/home/johannes/tensorflow-upstream/third_party/gpus/rocm_configure.bzl", line 401, column 30, in _get_rocm_config
                config = find_rocm_config(repository_ctx, find_rocm_config_script)
        File "/home/johannes/tensorflow-upstream/third_party/gpus/rocm_configure.bzl", line 379, column 41, in find_rocm_config
                exec_result = _exec_find_rocm_config(repository_ctx, script_path)
        File "/home/johannes/tensorflow-upstream/third_party/gpus/rocm_configure.bzl", line 375, column 19, in _exec_find_rocm_config
                return execute(repository_ctx, [python_bin, "-c", decompress_and_execute_cmd])
        File "/home/johannes/tensorflow-upstream/third_party/remote_config/common.bzl", line 230, column 13, in execute
                fail(
Error in fail: Repository command failed
ERROR: MIOpen version file "None" not found
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': no such package '@local_config_rocm//rocm': Repository command failed
ERROR: MIOpen version file "None" not found
WARNING: Target pattern parsing failed.
ERROR: no such package '@local_config_rocm//rocm': Repository command failed
ERROR: MIOpen version file "None" not found
INFO: Elapsed time: 0.202s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
    currently loading: tensorflow/tools/pip_package
johannes@JKMs-ROCm:~/tensorflow-upstream$
xuhuisheng commented 1 year ago

Typo. had change to 2.7.

And please check whether ROCm had installed successfully, As mention before, tf-2.7 used /opt/rocm-5.2.0 default. I haven't test it on rocm-5.3, So maybe 5.2.0 is safer. You can have a try.

TheJKM commented 1 year ago

Just saw that rocminfo is also not working, guess I'll do a fresh start.