Closed joelberkeley closed 1 year ago
Hello,
There are two things:
//xla/...
and a subset of those targets are rocm targets.@tpopp
//xla/...
is in the developer guide. I'm just confused as to why it's failing on something to do with rocm when the default config doesn't, from what i understand, specify a rocm buildbtw i tried without --config=monolith
and it gets a lot further, though i stopped the build after an hour
Without --config=monolithic
might very well succeed because then you are not trying to statically link everything, including the rocm libraries that you don't have. Alternatively, test
instead of build
might also work for you and be closer to what you are expecting because then it will only build targets necessary for tests and tested binaries that you are trying to run instead of all targets including the rocm ones.
You're expecting that //xla/...
will somehow filter based on other configurations, but that is not quite right. //xla/...
is going to build every single target. So a configuration might configure a target to not take a dependency on rocm, but that doesn't matter because you are also explicitly requesting that the rocm target be built, regardless of if it is used or not, and even requesting that all dependencies be required by trying to statically link everything.
Why are you using --config=monolithic
?
It does seem like we should probably add if_rocm_configured
guards to the dependencies of the static targets here which would hopefully fix this: https://github.com/openxla/xla/blob/main/xla/stream_executor/rocm/BUILD
They seem to have been added after previous work to make all targets compile regardless of the configuration. I don't think there is a reason to treat the static targets differently in this regards.
Without --config=monolithic might very well succeed because then you are not trying to statically link everything, including the rocm libraries that you don't have. Alternatively, test instead of build might also work for you and be closer to what you are expecting because then it will only build targets necessary for tests and tested binaries that you are trying to run instead of all targets including the rocm ones.
I'm using XLA in my own project, rather than working on XLA itself, so I don't need to test XLA
You're expecting that //xla/... will somehow filter based on other configurations, but that is not quite right. //xla/... is going to build every single target. So a configuration might configure a target to not take a dependency on rocm, but that doesn't matter because you are also explicitly requesting that the rocm target be built, regardless of if it is used or not, and even requesting that all dependencies be required by trying to statically link everything.
I'm not expecting anything that specific. I'm trying to figure out the build process and I've not seen it suggested anywhere that //xla/...
might not be appropriate. Should that be modified too?
Why are you using --config=monolithic?
I ideally want a single static library that I can link into my own XLA wrapper to build a single dynamic library. --config=monolithic
looked like a possible candidate for that
I apologize if I came off rude by the way.
So, //xla/...
refers to everything with bazel, and this is listed, just to show an example command and to try building everything to ensure everything works. also. The use of --config=monolithic
also seems like a good choice.
For your use case, other specific libraries could be targeted like //xla:c_srcs
I don't think there is a single top-level library for XLA that contains everything, so you would have to figure out which targets/libraries you wanted for your use case. I might be wrong though and will ask more appropriate people tomorrow.
Ah that's useful info, thanks, and it's all good
I just ran it with //xla/...
and without --config=monolithic
in github actions and got
ERROR: /github/home/.cache/bazel/_bazel_root/b2b44df59c1647c561a99e141342b63f/external/llvm-project/mlir/BUILD.bazel:6691:11: Compiling mlir/lib/Analysis/CFGLoopInfo.cpp failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 119 arguments skipped)
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from external/llvm-project/mlir/lib/Analysis/CFGLoopInfo.cpp:9:
external/llvm-project/mlir/include/mlir/Analysis/CFGLoopInfo.h:20:10: fatal error: llvm/Analysis/LoopInfo.h: No such file or directory
20 | #include "llvm/Analysis/LoopInfo.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
Unfortunately, that other failure was a temporary breakage that should be fixed now. The linked PR will hopefully fix your issue. It worked in my local testing at least.
Can you see if your issues are resolved after? https://github.com/openxla/xla/commit/d3978599502f1e6e0e8d9a3fd89adfb42c8bd2fc
appears to have got past that bug for
bazel build --test_output=all --spawn_strategy=sandboxed --nocheck_visibility //xla/...
in github actions, though it's still going after 3.5 hours so I don't know if or when it will finish
Hopefully with monolithic will work now also. I confirmed that there is no single top level target containing everything unfortunately. //xla/xla/service/gpu:gpu_compiler
, //xla/xla/service/cpu:cpu_compiler
, and //xla/xla/runtime:executable
might contain the symbols you want though.
Also, the long build times are expected, so definitely set up bazel caching on your github action if you haven't and want faster results.
Thanks.
The github build timed out after 6 hours so no indication there was any errors.
It would be really useful to have some kind of guide to how to navigate the build, specifically what to include in my bazel build if I want a set of symbols, especially since builds take so long and each symbol is built by a lot of different bazel targets. I'd understand if this isn't something you have the time to do, but it would be extremely useful for me.
actually, I should be able make a fair bit of headway by scanning the BUILD
files for the headers I'm using. Are //xla/xla/service/gpu:gpu_compiler
, //xla/xla/service/cpu:cpu_compiler
, and //xla/xla/runtime:executable
mutually exclusive (I'm guessing the CPU/GPU ones are, what about runtime:executable)?
I don't understand "mutually exclusive" in this context. They should be exposing different functionality but probably depend on some of the same underlying libraries. One might build Tensorflow with support of both the gpu_compiler, cpu_compiler, and executable, at the same time, so they can be used together, assuming they are used in a way that avoids ODR violations from any underlying functionality (I'm not great with linking/loading knowledge, so maybe that's not even a concern).
I'm closing this as I don't think there are more action items, but please re-open as needed.
I'm running the docker build on a mac M1, which isn't documented but I managed to get some of the way there by adding
--platform linux/x86_64/v8
to mydocker run
command. I've used the default configuration, but with option--config=monolithic
inI'm seeing the error
which is surprising since I have used the default config so am not expecting rocm to be involved.
The full logs are