openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.65k stars 424 forks source link

Compilation fails on Mac M1 (Sonoma 14.7) : "error: no matching function for call to 'min'" #17820

Open domkirke opened 2 weeks ago

domkirke commented 2 weeks ago

Hello everyone !

I encounter a weird issue trying to compile OpenXLA on Mac M1 for CPU (Sonoma 14.7). By running the following command, after having python configure.py --config=CPU : ./bazel-6.5.0-darwin-arm64 build --test_output=all --spawn_strategy=sandboxed //xla/...

The compilation fails, with the trace below. The xla_configure.bazelrc is :

build --action_env CLANG_COMPILER_PATH=/opt/homebrew/Cellar/llvm/18.1.8/bin/clang-18
build --repo_env CC=/opt/homebrew/Cellar/llvm/18.1.8/bin/clang-18
build --repo_env BAZEL_COMPILER=/opt/homebrew/Cellar/llvm/18.1.8/bin/clang-18
build --action_env PYTHON_BIN_PATH=/Users/domkirke/miniconda3/envs/jax/bin/python
build --python_path /Users/domkirke/miniconda3/envs/jax/bin/python
test --test_env LD_LIBRARY_PATH
test --test_size_filters small,medium
build --copt -Wno-sign-compare
build --copt -Wno-error=unused-command-line-argument
build --copt -Wno-gnu-offsetof-extensions
build --build_tag_filters -no_oss,-gpu
build --test_tag_filters -no_oss,-gpu
test --build_tag_filters -no_oss,-gpu
test --test_tag_filters -no_oss,-gpu

I removed the linker_env option because the llvm one failed, but otherwise I did not touch anything. I was maybe thinking that the error came from a wrong c++ format, but overriding the CXX_STANDARD does not change anything ; I see the -gpu flac in the bazelrc file, but it does not work when I remove it. Is it a proper issue or a misconfiguration? I run short of ideas on this one...

Thank you very much!

Failing log :

INFO: Reading 'startup' options from /Users/domkirke/Dropbox/code/jax-test/cpp/xla/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=139
INFO: Reading rc options for 'build' from /Users/domkirke/Dropbox/code/jax-test/cpp/xla/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /Users/domkirke/Dropbox/code/jax-test/cpp/xla/.bazelrc:
  'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Reading rc options for 'build' from /Users/domkirke/Dropbox/code/jax-test/cpp/xla/xla_configure.bazelrc:
  'build' options: --action_env CLANG_COMPILER_PATH=/opt/homebrew/Cellar/llvm/18.1.8/bin/clang-18 --repo_env CC=/opt/homebrew/Cellar/llvm/18.1.8/bin/clang-18 --repo_env BAZEL_COMPILER=/opt/homebrew/Cellar/llvm/18.1.8/bin/clang-18 --action_env PYTHON_BIN_PATH=/Users/domkirke/miniconda3/envs/jax/bin/python --python_path /Users/domkirke/miniconda3/envs/jax/bin/python --copt -Wno-sign-compare --copt -Wno-error=unused-command-line-argument --copt -Wno-gnu-offsetof-extensions --build_tag_filters -no_oss,-gpu --test_tag_filters -no_oss,-gpu
INFO: Found applicable config definition build:short_logs in file /Users/domkirke/Dropbox/code/jax-test/cpp/xla/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /Users/domkirke/Dropbox/code/jax-test/cpp/xla/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:macos in file /Users/domkirke/Dropbox/code/jax-test/cpp/xla/.bazelrc: --apple_platform_type=macos --copt=-DGRPC_BAZEL_BUILD --features=archive_param_file --copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=no_tfrt
INFO: Found applicable config definition build:no_tfrt in file /Users/domkirke/Dropbox/code/jax-test/cpp/xla/.bazelrc: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/ir,tensorflow/compiler/mlir/tfrt/ir/mlrt,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ifrt,tensorflow/compiler/mlir/tfrt/tests/mlrt,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/compiler/mlir/tfrt/transforms/mlrt,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/runtime_fallback/test,tensorflow/core/runtime_fallback/test/gpu,tensorflow/core/runtime_fallback/test/saved_model,tensorflow/core/runtime_fallback/test/testdata,tensorflow/core/tfrt/stubs,tensorflow/core/tfrt/tfrt_session,tensorflow/core/tfrt/mlrt,tensorflow/core/tfrt/mlrt/attribute,tensorflow/core/tfrt/mlrt/kernel,tensorflow/core/tfrt/mlrt/bytecode,tensorflow/core/tfrt/mlrt/interpreter,tensorflow/compiler/mlir/tfrt/translate/mlrt,tensorflow/compiler/mlir/tfrt/translate/mlrt/testdata,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils,tensorflow/core/tfrt/utils/debug,tensorflow/core/tfrt/saved_model/python,tensorflow/core/tfrt/graph_executor/python,tensorflow/core/tfrt/saved_model/utils
DEBUG: /Users/domkirke/Dropbox/code/jax-test/cpp/xla/third_party/py/python_repo.bzl:96:14: 
HERMETIC_PYTHON_VERSION variable was not set correctly, using default version.
Python 3.11 will be used.
To select Python version, either set HERMETIC_PYTHON_VERSION env variable in
your shell:
  export HERMETIC_PYTHON_VERSION=3.12
OR pass it as an argument to bazel command directly or inside your .bazelrc
file:
  --repo_env=HERMETIC_PYTHON_VERSION=3.12
DEBUG: /Users/domkirke/Dropbox/code/jax-test/cpp/xla/third_party/py/python_repo.bzl:107:10: Using hermetic Python 3.11
DEBUG: /Users/domkirke/Dropbox/code/jax-test/cpp/xla/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'llvm-raw' because it already exists.
DEBUG: /private/var/tmp/_bazel_domkirke/c0f7197f7a132c0d3bbe16d91268fe1d/external/tsl/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'nvtx_archive' because it already exists.
DEBUG: /Users/domkirke/Dropbox/code/jax-test/cpp/xla/third_party/repo.bzl:132:14: 
Warning: skipping import of repository 'jsoncpp_git' because it already exists.
WARNING: /Users/domkirke/Dropbox/code/jax-test/cpp/xla/xla/BUILD:272:11: target '//xla:status' is deprecated: Use @com_google_absl//absl/status instead.
INFO: Analyzed 4296 targets (346 packages loaded, 32847 targets configured).
INFO: Found 4296 targets...
WARNING: CopyFile uses implicit fallback from sandbox to local, which is deprecated because it is not hermetic. Prefer setting an explicit list of strategies, e.g., --strategy=CopyFile=sandbox,standalone
ERROR: /Users/domkirke/Dropbox/code/jax-test/cpp/xla/xla/service/gpu/BUILD:1193:11: Compiling xla/service/gpu/gpu_transfer_manager.cc failed: (Exit 1): wrapped_clang_pp failed: error executing command (from target //xla/service/gpu:gpu_transfer_manager) external/local_config_cc/wrapped_clang_pp '-D_FORTIFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -g0 -O2 -DNDEBUG ... (remaining 183 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
xla/service/gpu/gpu_transfer_manager.cc:241:24: error: no matching function for call to 'min'
        /*chunk_size=*/std::min(chunk_size, size - chunk_index * chunk_size)));
                       ^~~~~~~~
external/tsl/tsl/platform/errors.h:178:31: note: expanded from macro 'TF_RETURN_IF_ERROR'
    ::absl::Status _status = (__VA_ARGS__); \
                              ^~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__algorithm/min.h:40:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('size_t' (aka 'unsigned long') vs. 'unsigned long long')
min(const _Tp& __a, const _Tp& __b)
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__algorithm/min.h:51:1: note: candidate template ignored: could not match 'initializer_list<_Tp>' against 'size_t' (aka 'unsigned long')
min(initializer_list<_Tp> __t, _Compare __comp)
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__algorithm/min.h:60:1: note: candidate function template not viable: requires single argument '__t', but 2 arguments were provided
min(initializer_list<_Tp> __t)
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/usr/include/c++/v1/__algorithm/min.h:31:1: note: candidate function template not viable: requires 3 arguments, but 2 were provided
min(const _Tp& __a, const _Tp& __b, _Compare __comp)
^
1 error generated.
Error in child process '/usr/bin/xcrun'. 1
INFO: Elapsed time: 3422.490s, Critical Path: 142.72s
INFO: 22766 processes: 8700 internal, 14065 darwin-sandbox, 1 local.
FAILED: Build did NOT complete successfully
akuegel commented 1 week ago

https://github.com/openxla/xla/commit/cb6451b19c8618c857fc226c1b19bd7e86740a55 should have fixed this. Can you please try again?

akuegel commented 1 week ago

@domkirke for visibility. Can you please check whether this is fixed?

domkirke commented 6 days ago

Hello! Unfortunetaly this does not seem to be fixed, a new error appears after a git pull :

ERROR: /Users/domkirke/Code/jax-test/cpp/xla/xla/pjrt/c/BUILD:318:14: Linking xla/pjrt/c/pjrt_c_api_gpu_plugin.so failed: (Exit 1): cc_wrapper.sh failed: error executing command (from target //xla/pjrt/c:pjrt_c_api_gpu_plugin.so) external/local_config_cc/cc_wrapper.sh @bazel-out/darwin_arm64-opt/bin/xla/pjrt/c/pjrt_c_api_gpu_plugin.so-2.params

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
ld: unknown options: --version-script --no-undefined 
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Error in child process '/usr/bin/xcrun'. 1
INFO: Elapsed time: 4323.307s, Critical Path: 271.60s
INFO: 21665 processes: 8481 internal, 13183 darwin-sandbox, 1 local.
FAILED: Build did NOT complete successfully

I tried compiling with the same xla_configure.bazelrc, and to configure again ; result was the same.

akuegel commented 4 days ago

This could be a problem in the xla configure.py script.

@ddunl I can see that Tensorflow has these build options for macos: build:macos_arm64 --cpu=darwin_arm64

I don't see anything like that in the bazelrc file generated with the xla configure.py script

@domkirke Can you try adding this line to xla_configure.bazelrc manually?

domkirke commented 3 days ago

I tried, nothing new happened :/ I don't know much of bazel, but isn't the line build --build_tag_filters -no_oss,-gpu suspicious, as I have no GPU (putting MPS aside) on the computer?

domkirke commented 3 days ago

I tried removing all -gpu flags in the xla_configure.bazelrc file, but got a different error :

ERROR: /Users/domkirke/Code/jax-test/cpp/xla/xla/stream_executor/rocm/BUILD:1106:11: Compiling xla/stream_executor/rocm/rocm_status.cc failed: (Exit 1): wrapped_clang_pp failed: error executing command (from target //xla/stream_executor/rocm:rocm_status) external/local_config_cc/wrapped_clang_pp '-D_FORTIFY_SOURCE=1' -fstack-protector -fcolor-diagnostics -Wall -Wthread-safety -Wself-assign -fno-omit-frame-pointer -g0 -O2 -DNDEBUG ... (remaining 59 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from xla/stream_executor/rocm/rocm_status.cc:16:
./xla/stream_executor/rocm/rocm_status.h:24:10: fatal error: 'rocm/include/hip/hip_runtime.h' file not found
#include "rocm/include/hip/hip_runtime.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
akuegel commented 3 days ago

-gpu means that you filter out all gpu related things. So those filters are there for a reason :)

In any case, the reason why I think there is something MacOS specific missing from the generated bazelrc file is that those linker parameters should not be used on MacOS:

https://github.com/openxla/xla/blob/0fc891390264fb85ac822f45c4106c48e1a10ffc/xla/pjrt/c/BUILD#L236

So for some reason, your build setup is not detected as MacOS. I am not really familiar with the infrastructure side, I tried to guess what could help, but someone more knowledgeable on the infrastructure side is needed.