tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
186.54k stars 74.33k forks source link

Does TensorFlow2.13.0 support RISC-V #72479

Open 6eanut opened 4 months ago

6eanut commented 4 months ago

Issue type

Support

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.13.0

Custom code

Yes

OS platform and distribution

Linux openeuler-riscv-4-2 6.6.0

Mobile device

No response

Python version

3.11.6

Bazel version

5.3.0

GCC/compiler version

12.3.1

CUDA/cuDNN version

no

GPU model and memory

no

Current behavior?

I recently tried to build TensorFlow2.13.0 with bazel5.3.0 on RISC-V, but I encountered the following error during the build process:

ERROR: /home/tf2130/.cache/bazel/_bazel_tf2130/4d8a15755e0d938e330a7b941554a2cb/external/mkl_dnn_v1/BUILD.bazel:146:11: Compiling src/cpu/x64/rnn/brgemm_cell_common_bwd.cpp failed: (Exit 1): gcc failed: error executing command /usr/lib64/ccache/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 67 arguments skipped)
external/mkl_dnn_v1/src/cpu/x64/rnn/brgemm_cell_common_bwd.cpp: In member function 'void dnnl::impl::cpu::x64::brgemm_diff_src_layer_iter_t<weights_t, scratch_t, gemm_acc_t>::execute() const':
external/mkl_dnn_v1/src/cpu/x64/rnn/brgemm_cell_common_bwd.cpp:102:37: error: 'const struct dnnl::impl::cpu::rnn_utils::diff_src_brgemm_conf_t' has no member named 'isa'
  102 |             && rnn_.diff_src_brgemm.isa == x64::avx512_core_bf16_amx_bf16) {
      |                                     ^~~
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 69717.150s, Critical Path: 1675.59s
INFO: 9452 processes: 1564 internal, 7888 local.
FAILED: Build did NOT complete successfully

Standalone code to reproduce the issue

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout tags/v2.13.0
bazel build //tensorflow/tools/pip_package:build_pip_package --local_ram_resoues=1024 --jobs=6

Relevant log output

No response

6eanut commented 4 months ago

i use the bazel5.3.0. environment:

$ uname -a
Linux openeuler-riscv-4-2 6.6.0 #1 SMP Tue Jul  2 11:21:06 CST 2024 riscv64 riscv64 riscv64 GNU/Linux
$ cat /etc/os-release
NAME="openEuler"
VERSION="24.03 (LTS)"
ID="openEuler"
VERSION_ID="24.03"
PRETTY_NAME="openEuler 24.03 (LTS)"
ANSI_COLOR="0;31"
Venkat6871 commented 4 months ago

Hi @6eanut ,

Thank you!

6eanut commented 4 months ago

@Venkat6871 thanks for helping! Because compiling tensorflow2.17.0 requires bazel6.5.0, I chose to compile 2.13.0 (which only requires bazel5.3.0). bazel does not currently support risc-v, so the latest version of bazel I have is 5.3.0. I will try to compile bazel6.5.0 for risc-v

6eanut commented 3 months ago

@Venkat6871 I recently tried to build bazel6.5.0 on risc-v.

$ bazel version
Build label: 6.5.0
Build target: bazel-out/riscv64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue Jul 30 08:20:13 2024 (1722327613)
Build timestamp: 1722327613
Build timestamp as int: 1722327613

I cloned tensorflow from github and switched to tag at v2.17.0, and the following issue occurred

$ bazel build //tensorflow/tools/pip_package:wheel --repo_env=WHEEL_NAME=tensorflow
INFO: Reading 'startup' options from /home/tf2170/tensorflow/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=121
INFO: Reading rc options for 'build' from /home/tf2170/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/tf2170/tensorflow/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Reading rc options for 'build' from /home/tf2170/tensorflow/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/home/tf2170/venv00/bin/python3 --action_env PYTHON_LIB_PATH=/home/tf2170/venv00/lib/python3.11/site-packages --python_path=/home/tf2170/venv00/bin/python3
INFO: Found applicable config definition build:short_logs in file /home/tf2170/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/tf2170/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:linux in file /home/tf2170/tensorflow/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /home/tf2170/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
ERROR: /DEFAULT.WORKSPACE.SUFFIX:80:31: syntax error at '}': expected :
ERROR: Error computing the main repository mapping: error loading package 'external': Failed to parse default WORKSPACE file suffix
Loading:

more info:

$ ./configure
WARNING: current bazel installation is not a release version.
Please specify the location of python. [Default is /home/tf2170/venv00/bin/python3]:

Found possible Python library paths:
  /home/tf2170/venv00/lib/python3.11/site-packages
Please input the desired Python library path to use.  Default is [/home/tf2170/venv00/lib/python3.11/site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: N
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: N
No CUDA support will be enabled for TensorFlow.

Do you want to use Clang to build TensorFlow? [Y/n]: N
GCC will be used to compile TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -Wno-sign-compare]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=mkl_aarch64    # Build with oneDNN and Compute Library for the Arm Architecture (ACL).
        --config=monolithic     # Config for mostly static monolithic build.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v1             # Build with TensorFlow 1 API instead of TF 2 API.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=nogcp          # Disable GCP support.
        --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished
6eanut commented 3 months ago

In addition, I would like to know if there are other ways to build tensorflow wheel besides bazel

6eanut commented 2 months ago

@Venkat6871 I am currently building the latest version of tensorflow2.17.0 using bazel6.5.0 bazel6.5.0 is the rpm package here which has passed cpp and java tests in examples tensorflow is tags/v2.17.0 And then i have this problem

bazel build //tensorflow/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
... still trying to connect to local Bazel server (37701) after 10 seconds ...
... still trying to connect to local Bazel server (37701) after 20 seconds ...
... still trying to connect to local Bazel server (37701) after 30 seconds ...
... still trying to connect to local Bazel server (37701) after 40 seconds ...
INFO: Reading 'startup' options from /root/tensorflow/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=121
INFO: Reading rc options for 'build' from /root/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /root/tensorflow/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Reading rc options for 'build' from /root/tensorflow/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/root/venv311/bin/python3 --action_env PYTHON_LIB_PATH=/root/venv311/lib/python3.11/site-packages --python_path=/root/venv311/bin/python3
INFO: Found applicable config definition build:short_logs in file /root/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /root/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:linux in file /root/tensorflow/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /root/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
INFO: Repository python instantiated at:
  /root/tensorflow/WORKSPACE:47:27: in <toplevel>
  /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/rules_python/python/repositories.bzl:603:22: in python_register_toolchains
Repository rule toolchain_aliases defined at:
  /root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/rules_python/python/private/toolchains_repo.bzl:236:36: in <toplevel>
ERROR: An error occurred during the fetch of repository 'python':
   Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/rules_python/python/private/toolchains_repo.bzl", line 149, column 38, in _toolchain_aliases_impl
                host_platform = get_host_platform(os_name, arch)
        File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/rules_python/python/private/toolchains_repo.bzl", line 325, column 13, in get_host_platform
                fail("No platform declared for host OS {} on arch {}".format(os_name, arch))
Error in fail: No platform declared for host OS linux on arch riscv64
ERROR: /root/tensorflow/WORKSPACE:47:27: fetching toolchain_aliases rule //external:python: Traceback (most recent call last):
        File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/rules_python/python/private/toolchains_repo.bzl", line 149, column 38, in _toolchain_aliases_impl
                host_platform = get_host_platform(os_name, arch)
        File "/root/.cache/bazel/_bazel_root/efb88f6336d9c4a18216fb94287b8d97/external/rules_python/python/private/toolchains_repo.bzl", line 325, column 13, in get_host_platform
                fail("No platform declared for host OS {} on arch {}".format(os_name, arch))
Error in fail: No platform declared for host OS linux on arch riscv64
ERROR: Error computing the main repository mapping: no such package '@python//': No platform declared for host OS linux on arch riscv64
Loading:
zinovya commented 2 weeks ago

I'm getting similar error with tensorflow 2.18.0 and Bazel 7.2.1 (with both gcc and clang)

$ bazel build //tensorflow/tools/pip_package:wheel --repo_env=WHEEL_NAME=tensorflow_cpu
WARNING: Output base '/home/alexzinovyev/.cache/bazel/_bazel_alexzinovyev/0da2ad89d9ab383d81720f5a9ee2d3de' is on NFS. This may lead to surprising failures and undetermined behavior.
Starting local Bazel server and connecting to it...
... still trying to connect to local Bazel server (45328) after 10 seconds ...
... still trying to connect to local Bazel server (45328) after 20 seconds ...
... still trying to connect to local Bazel server (45328) after 30 seconds ...
... still trying to connect to local Bazel server (45328) after 40 seconds ...
INFO: Reading 'startup' options from /home/alexzinovyev/dev/tensorflow/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=186
INFO: Reading rc options for 'build' from /home/alexzinovyev/dev/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/alexzinovyev/dev/tensorflow/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Reading rc options for 'build' from /home/alexzinovyev/dev/tensorflow/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python3 --action_env PYTHON_LIB_PATH=/usr/lib/python3.11/site-packages --python_path=/usr/bin/python3 --action_env CLANG_COMPILER_PATH=/usr/bin/clang-16 --repo_env=CC=/usr/bin/clang-16 --repo_env=BAZEL_COMPILER=/usr/bin/clang-16 --copt=-Wno-gnu-offsetof-extensions
INFO: Found applicable config definition build:short_logs in file /home/alexzinovyev/dev/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/alexzinovyev/dev/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:linux in file /home/alexzinovyev/dev/tensorflow/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /home/alexzinovyev/dev/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
WARNING: --enable_bzlmod is set, but no MODULE.bazel file was found at the workspace root. Bazel will create an empty MODULE.bazel file. Please consider migrating your external dependencies from WORKSPACE to MODULE.bazel. For more details, please refer to https://github.com/bazelbuild/bazel/issues/18958.
DEBUG: /home/alexzinovyev/.cache/bazel/_bazel_alexzinovyev/0da2ad89d9ab383d81720f5a9ee2d3de/external/local_xla/third_party/py/python_repo.bzl:96:14: 
HERMETIC_PYTHON_VERSION variable was not set correctly, using default version.
Python 3.11 will be used.
To select Python version, either set HERMETIC_PYTHON_VERSION env variable in
your shell:
  export HERMETIC_PYTHON_VERSION=3.12
OR pass it as an argument to bazel command directly or inside your .bazelrc
file:
  --repo_env=HERMETIC_PYTHON_VERSION=3.12
DEBUG: /home/alexzinovyev/.cache/bazel/_bazel_alexzinovyev/0da2ad89d9ab383d81720f5a9ee2d3de/external/local_xla/third_party/py/python_repo.bzl:107:10: Using hermetic Python 3.11
ERROR: Failed to load Starlark extension '@@pypi//:requirements.bzl'.
Cycle in the workspace file detected. This indicates that a repository is used prior to being defined.
The following chain of repository dependencies lead to the missing definition.
 - @@pypi
 - @@python_riscv64-unknown-linux-gnu
This could either mean you have to add the '@@python_riscv64-unknown-linux-gnu' repository with a statement like `http_archive` in your WORKSPACE file (note that transitive dependencies are not added automatically), or move an existing definition earlier in your WORKSPACE file.
INFO: Repository pypi instantiated at:
  /home/alexzinovyev/dev/tensorflow/WORKSPACE:55:16: in <toplevel>
  /home/alexzinovyev/.cache/bazel/_bazel_alexzinovyev/0da2ad89d9ab383d81720f5a9ee2d3de/external/local_xla/third_party/py/python_init_pip.bzl:29:14: in python_init_pip
Repository rule pip_repository defined at:
  /home/alexzinovyev/.cache/bazel/_bazel_alexzinovyev/0da2ad89d9ab383d81720f5a9ee2d3de/external/rules_python/python/private/pypi/pip_repository.bzl:210:33: in <toplevel>
ERROR: Error computing the main repository mapping: cycles detected during computation of main repo mapping
Computing main repo mapping: 
    Fetching repository @@pypi; starting

Any suggestion how to get the build going on risc-v?

6eanut commented 2 weeks ago

@zinovya I guess this is a problem with bazel not being adapted for rv. Also, how did you get the riscv bazel7.2.1?