rocm-arch / tensorflow-rocm

tensorflow-rocm AUR package
17 stars 12 forks source link

clang_version.split('.') #66

Closed Estirp closed 4 days ago

Estirp commented 5 months ago
Do you want to use Clang to build TensorFlow? [Y/n]:
Clang will be used to compile TensorFlow.

Please specify the path to clang executable. [Default is /usr/lib/ccache/bin/clang]:

WARNING: current clang installation is not a release version.

Traceback (most recent call last):
  File "/home/construction_de_packet/.cache/yay/tensorflow-rocm/src/tensorflow-2.15.0-rocm/./configure.py", line 1465, in <module>
    main()
  File "/home/construction_de_packet/.cache/yay/tensorflow-rocm/src/tensorflow-2.15.0-rocm/./configure.py", line 1416, in main
    disable_clang16_offsetof_extension(clang_version)
  File "/home/construction_de_packet/.cache/yay/tensorflow-rocm/src/tensorflow-2.15.0-rocm/./configure.py", line 885, in disable_clang16_offsetof_extension
    if int(clang_version.split('.')[0]) == 16:
           ^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
/usr/lib/ccache/bin/clang --version
clang version 16.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

so I think this program expect wrongly to receive only the first line.

mpeschel10 commented 5 months ago

On my system, the clang version looks like this:

clang version 16.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

So the version string parsing is not the problem.

I think ccache might be the problem. I tried setting export CLANG_COMPILER_PATH=/usr/lib/ccache/bin/clang in the PKGBUILD, and added the following code to retrieve_clang_version(clang_executable) in configure.py:

  print('Retrieving clang version from executable', clang_executable)
  curr_version = run_shell([clang_executable, '--version'],
                           allow_non_zero=True,
                           stderr=stderr)
  print('curr_version', curr_version)

And I got this when I compiled:

...
Retrieving clang version from executable /usr/bin/ccache
curr_version ccache version 4.9
Features: file-storage http-storage redis+unix-storage redis-storage

Copyright (C) 2002-2007 Andrew Tridgell
Copyright (C) 2009-2023 Joel Rosdahl and other contributors

See <https://ccache.dev/credits.html> for a complete list of contributors.

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 3 of the License, or (at your option) any later
version.
...

That would certainly break the parsing. Did you edit the CLANG_COMPILER_PATH to point to ccache?

Might also be something to do with yay. I don't use that. Try compiling again with

sudo pacman -Syu
sudo pacman -R ccache
git clone https://aur.archlinux.org/tensorflow-rocm.git
cd tensorflow-rocm
makepkg
sudo pacman -S ccache

And see if the problem persists.

Estirp commented 5 months ago
tensorflow-rocm]$ makepkg
==> ERREUR : Cannot find the ccache binary required for compiler cache usage.

So I reinstall ccache then :

makepkg
==> Making package: tensorflow-rocm 2.15.0-8 (Thu Feb  8 06:44:46 2024)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> Retrieving sources...
  -> Found tensorflow-rocm-2.15.0.tar.gz
  -> Found bazel_nojdk-6.1.0-linux-x86_64
  -> Found fix-c++17-compat.patch
==> Validating source files with sha512sums...
    tensorflow-rocm-2.15.0.tar.gz ... Passed
    bazel_nojdk-6.1.0-linux-x86_64 ... Passed
    fix-c++17-compat.patch ... Passed
==> Extracting sources...
  -> Extracting tensorflow-rocm-2.15.0.tar.gz with bsdtar
  -> Extracting bazel_nojdk-6.1.0-linux-x86_64 with bsdtar
==> Starting prepare()...
bazel 6.1.0
==> Removing existing $pkgdir/ directory...
==> Starting build()...
Building with rocm and without non-x86-64 optimizations
You have bazel 6.1.0 installed.
You have Clang 16.0.6 installed.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
    --config=mkl            # Build with MKL support.
    --config=mkl_aarch64    # Build with oneDNN and Compute Library for the Arm Architecture (ACL).
    --config=monolithic     # Config for mostly static monolithic build.
    --config=numa           # Build with NUMA support.
    --config=dynamic_kernels    # (Experimental) Build kernels into separate shared objects.
    --config=v1             # Build with TensorFlow 1 API instead of TF 2 API.
Preconfigured Bazel build configs to DISABLE default on features:
    --config=nogcp          # Disable GCP support.
    --config=nonccl         # Disable NVIDIA NCCL support.
Configuration finished
Killed non-responsive server process (pid=310892)
Starting local Bazel server and connecting to it...
... still trying to connect to local Bazel server (313856) after 10 seconds ...
... still trying to connect to local Bazel server (313856) after 20 seconds ...
... still trying to connect to local Bazel server (313856) after 30 seconds ...
... still trying to connect to local Bazel server (313856) after 40 seconds ...
... still trying to connect to local Bazel server (313856) after 50 seconds ...
... still trying to connect to local Bazel server (313856) after 60 seconds ...
... still trying to connect to local Bazel server (313856) after 71 seconds ...
... still trying to connect to local Bazel server (313856) after 81 seconds ...
... still trying to connect to local Bazel server (313856) after 91 seconds ...
... still trying to connect to local Bazel server (313856) after 101 seconds ...
... still trying to connect to local Bazel server (313856) after 111 seconds ...
FATAL: couldn't connect to server (313856) after 120 seconds.
==> ERROR: A failure occurred in build().
    Aborting...

So I check how Bazel was launched :

+ bazel build --config=mkl -c opt //tensorflow:libtensorflow.so //tensorflow:libtensorflow_cc.so //tensorflow:install_headers //tensorflow/tools/pip_package:build_pip_package
Killed non-responsive server process (pid=325793)
Starting local Bazel server and connecting to it...

I never used Bazel.

Estirp commented 5 months ago

After nft insert rule inet general entree 'ip6 daddr ::1 accept ...

Starting local Bazel server and connecting to it...
INFO: Reading 'startup' options from /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=103
INFO: Reading rc options for 'build' from /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Reading rc options for 'build' from /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python3.11 --action_env PYTHON_LIB_PATH=/usr/lib/python3.11/site-packages --python_path=/usr/bin/python3.11 --define=with_xla_support=true --config=rocm --action_env CLANG_COMPILER_PATH=/usr/bin/clang-16 --repo_env=CC=/usr/bin/clang-16 --repo_env=BAZEL_COMPILER=/usr/bin/clang-16 --copt=-Wno-gnu-offsetof-extensions --action_env TF_SYSTEM_LIBS=boringssl,curl,cython,gif,icu,libjpeg_turbo,nasm,png,zlib
INFO: Found applicable config definition build:short_logs in file /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:rocm in file /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.bazelrc: --crosstool_top=@local_config_rocm//crosstool:toolchain --define=using_rocm_hipcc=true --define=tensorflow_mkldnn_contraction_kernel=0 --repo_env TF_NEED_ROCM=1 --config=no_tfrt
INFO: Found applicable config definition build:no_tfrt in file /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.bazelrc: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/ir,tensorflow/compiler/mlir/tfrt/ir/mlrt,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/mlrt,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/compiler/mlir/tfrt/transforms/mlrt,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/runtime_fallback/test,tensorflow/core/runtime_fallback/test/gpu,tensorflow/core/runtime_fallback/test/saved_model,tensorflow/core/runtime_fallback/test/testdata,tensorflow/core/tfrt/stubs,tensorflow/core/tfrt/tfrt_session,tensorflow/core/tfrt/mlrt,tensorflow/core/tfrt/mlrt/attribute,tensorflow/core/tfrt/mlrt/kernel,tensorflow/core/tfrt/mlrt/bytecode,tensorflow/core/tfrt/mlrt/interpreter,tensorflow/compiler/mlir/tfrt/translate/mlrt,tensorflow/compiler/mlir/tfrt/translate/mlrt/testdata,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils,tensorflow/core/tfrt/utils/debug,tensorflow/core/tfrt/saved_model/python,tensorflow/core/tfrt/graph_executor/python,tensorflow/core/tfrt/saved_model/utils
INFO: Found applicable config definition build:mkl in file /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.bazelrc: --define=build_with_mkl=true --define=enable_mkl=true --define=tensorflow_mkldnn_contraction_kernel=0 --define=build_with_openmp=true -c opt
INFO: Found applicable config definition build:linux in file /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
INFO: Repository local_config_rocm instantiated at:
  /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/WORKSPACE:84:14: in <toplevel>
  /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/tensorflow/workspace2.bzl:918:19: in workspace
  /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/tensorflow/workspace2.bzl:112:19: in _tf_toolchains
Repository rule rocm_configure defined at:
  /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl:833:33: in <toplevel>
ERROR: An error occurred during the fetch of repository 'local_config_rocm':
   Traceback (most recent call last):
    File "/home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl", line 811, column 38, in _rocm_autoconf_impl
        _create_local_rocm_repository(repository_ctx)
    File "/home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl", line 600, column 27, in _create_local_rocm_repository
        rocm_libs = _find_libs(repository_ctx, rocm_config, hipfft_or_rocfft, miopen_path, rccl_path, bash_bin)
    File "/home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl", line 366, column 34, in _find_libs
        return _select_rocm_lib_paths(repository_ctx, libs_paths, bash_bin)
    File "/home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl", line 328, column 36, in _select_rocm_lib_paths
        auto_configure_fail("Cannot find rocm library %s" % name)
    File "/home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl", line 153, column 9, in auto_configure_fail
        fail("\n%sROCm Configuration Error:%s %s\n" % (red, no_color, msg))
Error in fail:
ROCm Configuration Error: Cannot find rocm library amdhip64
ERROR: /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/WORKSPACE:84:14: fetching rocm_configure rule //external:local_config_rocm: Traceback (most recent call last):
    File "/home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl", line 811, column 38, in _rocm_autoconf_impl
        _create_local_rocm_repository(repository_ctx)
    File "/home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl", line 600, column 27, in _create_local_rocm_repository
        rocm_libs = _find_libs(repository_ctx, rocm_config, hipfft_or_rocfft, miopen_path, rccl_path, bash_bin)
    File "/home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl", line 366, column 34, in _find_libs
        return _select_rocm_lib_paths(repository_ctx, libs_paths, bash_bin)
    File "/home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl", line 328, column 36, in _select_rocm_lib_paths
        auto_configure_fail("Cannot find rocm library %s" % name)
    File "/home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/third_party/gpus/rocm_configure.bzl", line 153, column 9, in auto_configure_fail
        fail("\n%sROCm Configuration Error:%s %s\n" % (red, no_color, msg))
Error in fail:
ROCm Configuration Error: Cannot find rocm library amdhip64
INFO: Repository rules_cc instantiated at:
  /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/WORKSPACE:88:14: in <toplevel>
  /home/construction_de_packet/tensorflow-rocm/src/tensorflow-2.15.0-rocm/tensorflow/workspace1.bzl:19:28: in workspace
  /home/construction_de_packet/.cache/bazel/_bazel_construction_de_packet/c74861d2a9d3623f76f1d2fc9c13f541/external/rules_cuda/cuda/dependencies.bzl:72:18: in rules_cuda_dependencies
  /home/construction_de_packet/.cache/bazel/_bazel_construction_de_packet/c74861d2a9d3623f76f1d2fc9c13f541/external/rules_cuda/cuda/dependencies.bzl:35:17: in _rules_cc
Repository rule http_archive defined at:
  /home/construction_de_packet/.cache/bazel/_bazel_construction_de_packet/c74861d2a9d3623f76f1d2fc9c13f541/external/bazel_tools/tools/build_defs/repo/http.bzl:372:31: in <toplevel>
ERROR: Skipping '//tensorflow:libtensorflow_cc.so': no such package '@local_config_rocm//rocm':
ROCm Configuration Error: Cannot find rocm library amdhip64
ERROR: no such package '@local_config_rocm//rocm':
ROCm Configuration Error: Cannot find rocm library amdhip64
INFO: Elapsed time: 8.698s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
    currently loading: tensorflow/tools/pip_package ... (2 packages)
==> ERROR: A failure occurred in build().
    Aborting...

However I had /opt/rocm/lib/libamdhip64.so /opt/rocm/lib/libamdhip64.so.6 /opt/rocm/lib/libamdhip64.so.6.0.32830 on my file system.

mpeschel10 commented 5 months ago

My apologies. This problem is beyond me to solve. Good luck.

Estirp commented 5 months ago

You help me going further. Thank you.

acxz commented 4 days ago

@Estirp the original issue mentioned in your first comment should be resolved with fa64355. I'm closing this issue. If you encounter the same issue please comment on here and I'll reopen this issue. If you encounter a different issue, check to see if it hasn't been reported, if not feel free to create a new issue.