Open pxanthopoulos opened 7 months ago
I have met the same error with you, #10592, still awaiting a response.
Same problem... If i had to guess I would say there's a dependency declaration missing somewhere... but bazel is black magic i dare not look at...
Something like
diff --git a/xla/stream_executor/cuda/BUILD b/xla/stream_executor/cuda/BUILD
index 2212fb622..bea1e01b9 100644
--- a/xla/stream_executor/cuda/BUILD
+++ b/xla/stream_executor/cuda/BUILD
@@ -75,6 +75,8 @@ cuda_only_cc_library(
"//xla/stream_executor",
"//xla/stream_executor:platform_manager",
"//xla/stream_executor:stream_executor_interface",
+ "//xla/stream_executor:executor_cache",
+ "//xla/stream_executor:kernel",
"//xla/stream_executor/gpu:gpu_driver_header",
"//xla/stream_executor/gpu:gpu_executor_header",
"//xla/stream_executor/platform",
diff --git a/xla/stream_executor/gpu/BUILD b/xla/stream_executor/gpu/BUILD
index f0843969d..348f89528 100644
--- a/xla/stream_executor/gpu/BUILD
+++ b/xla/stream_executor/gpu/BUILD
@@ -153,6 +153,7 @@ gpu_only_cc_library(
":gpu_types_header",
"//xla/stream_executor",
"//xla/stream_executor:stream_executor_interface",
+ "//xla/stream_executor:kernel",
"@com_google_absl//absl/container:flat_hash_map",
"@com_google_absl//absl/container:inlined_vector",
"@com_google_absl//absl/functional:any_invocable",
gets pretty far. But eventually fails linking as well with:
ERROR: /xla/xla/tests/BUILD:2472:12: Linking xla/tests/local_client_aot_test failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tests:local_client_aot_test) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tests/local_client_aot_test-2.params
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
/usr/bin/ld: bazel-out/k8-opt/bin/external/tsl/tsl/profiler/backends/cpu/libtraceme_recorder_impl.lo(traceme_recorder.o): in function `void __gnu_cxx::new_allocator<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>::construct<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>(tsl::profiler::TraceMeRecorder::ThreadLocalRecorder*)':
traceme_recorder.cc:(.text._ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_[_ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_]+0x6a): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Trace(stream_executor::Stream*, absl::lts_20230802::AnyInvocable<absl::lts_20230802::Status ()>)':
gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x82): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x10d): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Finalize()':
gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x273): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x2be): undefined reference to `tsl::Env::Default()'
/usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_driver_cuda_only.a(cuda_driver.o):cuda_driver.cc:(.text._ZN15stream_executor3gpu9GpuDriver18GraphDebugDotPrintB5cxx11EP10CUgraph_stPKcb+0x93): more undefined references to `tsl::Env::Default()' follow
i.e. tsl/platform/default/*
was not compiled?
@pxanthopoulos were you able to find a solution? I am facing the same error when trying to build xla from source for GPU:
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Reading 'startup' options from /users/neeld2/xla/.bazelrc: --windows_enable_symlinks
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=198
INFO: Reading rc options for 'build' from /users/neeld2/xla/.bazelrc:
Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /users/neeld2/xla/.bazelrc:
'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --features=-force_no_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Reading rc options for 'build' from /users/neeld2/xla/xla_configure.bazelrc:
'build' options: --action_env CLANG_COMPILER_PATH=/usr/lib/llvm-17/bin/clang --repo_env CC=/usr/lib/llvm-17/bin/clang --repo_env BAZEL_COMPILER=/usr/lib/llvm-17/bin/clang --config nvcc_clang --action_env CLANG_CUDA_COMPILER_PATH=/usr/lib/llvm-17/bin/clang --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-12.3 --action_env TF_CUBLAS_VERSION=12.3.2 --action_env TF_CUDA_COMPUTE_CAPABILITIES=6.0 --action_env TF_CUDNN_VERSION=8 --repo_env TF_NEED_TENSORRT=0 --config nonccl --action_env LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64:/usr/local/cuda-12.3/lib64 --action_env PYTHON_BIN_PATH=/usr/bin/python --python_path /usr/bin/python --copt -Wno-sign-compare --copt -Wno-error=unused-command-line-argument --copt -Wno-gnu-offsetof-extensions --build_tag_filters -no_oss --test_tag_filters -no_oss
INFO: Found applicable config definition build:short_logs in file /users/neeld2/xla/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /users/neeld2/xla/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:nvcc_clang in file /users/neeld2/xla/.bazelrc: --config=cuda --action_env=TF_CUDA_CLANG=1 --action_env=TF_NVCC_CLANG=1 --@local_config_cuda//:cuda_compiler=nvcc
INFO: Found applicable config definition build:cuda in file /users/neeld2/xla/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:nonccl in file /users/neeld2/xla/.bazelrc: --define=no_nccl_support=true
INFO: Found applicable config definition build:monolithic in file /users/neeld2/xla/.bazelrc: --define framework_shared_object=false --define tsl_protobuf_header_only=false --experimental_link_static_libraries_once=false
INFO: Found applicable config definition build:linux in file /users/neeld2/xla/.bazelrc: --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /users/neeld2/xla/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
DEBUG: /users/neeld2/xla/third_party/py/python_repo.bzl:98:14:
HERMETIC_PYTHON_VERSION variable was not set correctly, using default version.
Python 3.11 will be used.
To select Python version, either set HERMETIC_PYTHON_VERSION env variable in
your shell:
export HERMETIC_PYTHON_VERSION=3.12
OR pass it as an argument to bazel command directly or inside your .bazelrc
file:
--repo_env=HERMETIC_PYTHON_VERSION=3.12
DEBUG: /users/neeld2/xla/third_party/py/python_repo.bzl:109:10: Using hermetic Python 3.11
DEBUG: /users/neeld2/xla/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'llvm-raw' because it already exists.
DEBUG: /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/tsl/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'nvtx_archive' because it already exists.
DEBUG: /users/neeld2/xla/third_party/repo.bzl:132:14:
Warning: skipping import of repository 'jsoncpp_git' because it already exists.
DEBUG: /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:10:
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
DEBUG: /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:118:10:
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
ERROR: /users/neeld2/xla/xla/tsl/cuda/BUILD.bazel:278:11: no such target '@local_config_nccl//:nccl_headers': target 'nccl_headers' not declared in package '' defined by /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/local_config_nccl/BUILD (Tip: use `query "@local_config_nccl//:*"` to see all the targets in that package) and referenced by '//xla/tsl/cuda:nccl_stub'
INFO: Repository boringssl instantiated at:
/users/neeld2/xla/WORKSPACE:46:15: in <toplevel>
/users/neeld2/xla/workspace2.bzl:135:21: in workspace
/users/neeld2/xla/workspace2.bzl:64:20: in _tf_repositories
/users/neeld2/xla/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
/users/neeld2/xla/third_party/repo.bzl:89:35: in <toplevel>
ERROR: Analysis of target '//xla/tsl/cuda:nccl_stub' failed; build aborted: Analysis failed
INFO: Elapsed time: 51.639s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (283 packages loaded, 18469 targets configured)
currently loading: @upb//
Fetching repository @pypi_lit; starting 11s
Fetching repository @double_conversion; starting
Fetching https://storage.googleapis.com/mirror.tensorflow.org/github.com/google/boringssl/archive/c00d7ca810e93780bd0c8ee4eea28f4f2ea4bcdc.tar.gz; 11.5 MiB (27.8%)
Fetching /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/double_conversion; Extracting v3.2.0.tar.gz
Fetching repository @curl; starting
Fetching /users/neeld2/.cache/bazel/_bazel_neeld2/1a2b1acac21e9debfa6c46a0a26cdb69/external/curl; Extracting curl-8.4.0.tar.gz
Fetching repository @scip; Restarting.
I tried passing the --config monolithic
option, but it didn't work.
@neeldani what's your configure step like? should look like ./configure.py --backend=CUDA --nccl
This worked, thank you!
Something like
diff --git a/xla/stream_executor/cuda/BUILD b/xla/stream_executor/cuda/BUILD index 2212fb622..bea1e01b9 100644 --- a/xla/stream_executor/cuda/BUILD +++ b/xla/stream_executor/cuda/BUILD @@ -75,6 +75,8 @@ cuda_only_cc_library( "//xla/stream_executor", "//xla/stream_executor:platform_manager", "//xla/stream_executor:stream_executor_interface", + "//xla/stream_executor:executor_cache", + "//xla/stream_executor:kernel", "//xla/stream_executor/gpu:gpu_driver_header", "//xla/stream_executor/gpu:gpu_executor_header", "//xla/stream_executor/platform", diff --git a/xla/stream_executor/gpu/BUILD b/xla/stream_executor/gpu/BUILD index f0843969d..348f89528 100644 --- a/xla/stream_executor/gpu/BUILD +++ b/xla/stream_executor/gpu/BUILD @@ -153,6 +153,7 @@ gpu_only_cc_library( ":gpu_types_header", "//xla/stream_executor", "//xla/stream_executor:stream_executor_interface", + "//xla/stream_executor:kernel", "@com_google_absl//absl/container:flat_hash_map", "@com_google_absl//absl/container:inlined_vector", "@com_google_absl//absl/functional:any_invocable",
gets pretty far. But eventually fails linking as well with:
ERROR: /xla/xla/tests/BUILD:2472:12: Linking xla/tests/local_client_aot_test failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (from target //xla/tests:local_client_aot_test) external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc @bazel-out/k8-opt/bin/xla/tests/local_client_aot_test-2.params Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging /usr/bin/ld: bazel-out/k8-opt/bin/external/tsl/tsl/profiler/backends/cpu/libtraceme_recorder_impl.lo(traceme_recorder.o): in function `void __gnu_cxx::new_allocator<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>::construct<tsl::profiler::TraceMeRecorder::ThreadLocalRecorder>(tsl::profiler::TraceMeRecorder::ThreadLocalRecorder*)': traceme_recorder.cc:(.text._ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_[_ZN9__gnu_cxx13new_allocatorIN3tsl8profiler15TraceMeRecorder19ThreadLocalRecorderEE9constructIS4_JEEEvPT_DpOT0_]+0x6a): undefined reference to `tsl::Env::Default()' /usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Trace(stream_executor::Stream*, absl::lts_20230802::AnyInvocable<absl::lts_20230802::Status ()>)': gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x82): undefined reference to `tsl::Env::Default()' /usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer5TraceEPNS_6StreamEN4absl12lts_2023080212AnyInvocableIFNS5_6StatusEvEEE+0x10d): undefined reference to `tsl::Env::Default()' /usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/gpu/libgpu_command_buffer_gpu_only.a(gpu_command_buffer.o): in function `stream_executor::gpu::GpuCommandBuffer::Finalize()': gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x273): undefined reference to `tsl::Env::Default()' /usr/bin/ld: gpu_command_buffer.cc:(.text._ZN15stream_executor3gpu16GpuCommandBuffer8FinalizeEv+0x2be): undefined reference to `tsl::Env::Default()' /usr/bin/ld: bazel-out/k8-opt/bin/xla/stream_executor/cuda/libcuda_driver_cuda_only.a(cuda_driver.o):cuda_driver.cc:(.text._ZN15stream_executor3gpu9GpuDriver18GraphDebugDotPrintB5cxx11EP10CUgraph_stPKcb+0x93): more undefined references to `tsl::Env::Default()' follow
i.e.
tsl/platform/default/*
was not compiled?
So, the linker error how to resolve, I get the same error: undefined reference to `tsl::Env::Default()'
I am trying to build XLA from source following the instructions found below, with Docker & GPU support:
https://openxla.org/xla/build_from_source
More specifically, i cloned the XLA repo from a directory and executed the following commands:
docker run --gpus all --name xla_gpu -w /xla -it -d --rm -v ./xla:/xla tensorflow/build:latest-python3.9 bash
(I added the
--gpus all
flag because the configure script failed as it could not findnvidia-smi
.)docker exec -it xla_gpu bash
./configure.py --backend=CUDA
with output:bazel build --test_output=all --spawn_strategy=sandboxed //xla/...
This step failed with the following error message:
I overcame this error by editing the file
/root/.cache/bazel/_bazel_root/e4ab50d61a21943a819d1e092972a817/external/local_config_nccl/BUILD
referenced at the error message. I added the following to the end of this file:alias( name = "nccl_headers", actual = "@nccl_archive//:nccl_headers", visibility = ["//visibility:public"], )
Then, I reran the 4th step (the build command). After building ~39000 of the ~45000 targets, it then failed with the following error message: