openxla / xla

A machine learning compiler for GPUs, CPUs, and ML accelerators
Apache License 2.0
2.56k stars 400 forks source link

[platform set error][GPU-cuda]NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in? #15007

Open FatJhon opened 1 month ago

FatJhon commented 1 month ago

./configure.py --backend=CUDA

bazel build --test_output=all --spawn_strategy=sandboxed //xla/...

when set platform cuda,TF_ASSERT_OK_AND_ASSIGN(se::Platform * platform, PlatformUtil::GetPlatform("cuda")); get an error:NOT_FOUND: could not find registered compiler for platform CUDA -- was support for that platform linked in? how can i solve it?

beckerhe commented 1 month ago

Can you share the contents of your xla_configure.bazelrc file (and if present also the contents of your .tf_configure.bazelrc file)? Both should live in your root XLA directory.

FatJhon commented 1 month ago

Thanks a lot for replying! Here only exist is xla_configure.bazelrc, and the content flows : build --action_env CLANG_COMPILER_PATH=/home/weight/tools/llvm-17.x/bin/clang-17 build --repo_env CC=/home/weight/tools/llvm-17.x/bin/clang-17 build --repo_env BAZEL_COMPILER=/home/weight/tools/llvm-17.x/bin/clang-17 build --linkopt --ld-path=/home/weight/tools/llvm-17.x/bin/ld.lld build --config nvcc_clang build --action_env CLANG_CUDA_COMPILER_PATH=/home/weight/tools/llvm-17.x/bin/clang-17 build --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-12.4 build --action_env TF_CUBLAS_VERSION=12.4.5 build --action_env TF_CUDA_COMPUTE_CAPABILITIES=8.6,8.9 build --action_env TF_CUDNN_VERSION=9 build --repo_env TF_NEED_TENSORRT=0 build --action_env TF_NCCL_VERSION=2 build --action_env PYTHON_BIN_PATH=/usr/bin/python3 build --python_path /usr/bin/python3 test --test_env LD_LIBRARY_PATH test --test_size_filters small,medium build --copt -Wno-sign-compare build --copt -Wno-error=unused-command-line-argument build --copt -Wno-gnu-offsetof-extensions build --build_tag_filters -no_oss build --test_tag_filters -no_oss test --build_tag_filters -no_oss test --test_tag_filters -no_oss

more detail and similar question is in #15054 ,i had tried gpu test in xla,but not worked,return No test targets were found, yet testing was requested. Thanks.

beckerhe commented 1 month ago

Hmm. This looks all good. Would you mind sharing the entire Bazel output?

I'm mainly interested in what tests are failing with that error message.

FatJhon commented 1 month ago

I finally solved this problem by modify BUID for stablehlo_compile_test.cc. This test is supportted for cpu, so gup compiler is not registed yet.

huhuiqi7 commented 1 month ago

I finally solved this problem by modify BUID for stablehlo_compile_test.cc. This test is supportted for cpu, so gup compiler is not registed yet.

I also encountered the same problem. How to modify it specifically?

beckerhe commented 4 weeks ago

@FatJhon Would you be able to share what exactly you changed? This might help others like @huhuiqi7.

FatJhon commented 4 weeks ago

@huhuiqi7 @beckerhe late to see. change as follows: img_v3_02dj_57d0e0e3-20e6-4a40-bc0f-269b785b050g img_v3_02dj_af3f2d35-747d-4c5d-8881-bbb2d5fcb49g

beckerhe commented 4 weeks ago

Ah, okay, now I'm getting the problem. Yes, you need to link the gpu_plugin (or something that links the gpu_plugin), otherwise stream executor will tell you the platform is not available.

@huhuiqi7 Is that helping you as well? If not, can you share some more details like exact error messages and build logs?