jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
http://jax.readthedocs.io/
Apache License 2.0
30.53k stars 2.8k forks source link

Compiling an HLO module without bazel #5814

Open maxstupp opened 3 years ago

maxstupp commented 3 years ago

Hello,

Is there currently a way to use the converted HLO modules from jax_to_hlo.py and run them outside the tensorflow repository? (using steps in the example of https://github.com/google/jax/issues/5337 at the moment)

I'm using https://github.com/FloopCZ/tensorflow_cc approach to use the Tensorflow C++ API outside the source code folders and compile my projects with cmake.

But some functionality like the pjrt client or the hlo_module_loader are not available. Is the only way to load these modules within the tensorflow repository?

Thanks!

zhangqiaorjc commented 3 years ago

Is this what you want?

https://github.com/google/jax/tree/master/examples/jax_cpp

scottlegrand commented 3 years ago

I'm guessing he wants to compile without bazel and this example is compiled with bazel. Also, is XLA tightly coupled to TensorFlow or is there a way to compile and run without bazel AND TensorFlow installed? That's what I'm trying to figure out as well. I chose Jax for my HPC project in the hopes I could avoid unnecessary dependency and code bloat inlining its networks into an existing codebase to allow it use neural networks as energy models for molecular dynamics. If that's not possible, let me know?

My guess is no on both counts ATM after asking around: XLA is tightly coupled to TensorFlow and bazel is, well, bazel.

skye commented 3 years ago

This currently isn't possible because, as noted above, the required XLA and PJRT dependencies can only be built as part of the TF tree using bazel. I agree this is cumbersome.

I think the best solution would be for us to bundle up the required dependencies into a new shared library + headers that we would periodically build (with bazel) and release, and then you would build against the released library instead of building from source. This is non-trivial to setup and support indefinitely, so I can't promise anything right now, but maybe we can put something together. Especially if more people chime in to this issue expressing interest! (hint hint anyone who comes across this issue)

@hawkinsp @zhangqiaorjc do you have any further thoughts on this?

zhangqiaorjc commented 3 years ago

Short of moving XLA out of TF, I don't see how we can avoid Bazel and TF tree dep...

maxstupp commented 3 years ago

I am trying something very similiar to scottlegrand. My project is using cmake and I was hoping there would be a way to build a shared library with the necessary or even all headers and use it as an external package with cmake.

What I am looking for is pretty much the application of my jax network/functions in other c++ projects. So train them in python and use them in c++. The dependency on the whole TF tree and bazel makes this quite difficult.

I also tried the example with bazel https://github.com/google/jax/tree/master/examples/jax_cpp but the build process is failing for me (this is not so important, because it depends on bazel again). I attached my error message in a text file if someone wants to have a look. Error dump.txt

scottlegrand commented 3 years ago

I'm seeing different issues with bazel 3.7.2 myself... I cannot build this example either... My application (www.ambermd.org) has ~30,000 users across a wide variety of machines, clusters, and linux variants. I just can't force bazel/tensorflow into the compilation and execution of this and expect it to work out well. I will keep my eye on this project to see if XLA is eventually made independent of TensorFlow/bazel though because I love this framework already.

Errors attached bazel_errors.txt

zhangqiaorjc commented 3 years ago

@scottlegrand you could try setting bazel flag --check_visibility=false?

zhangqiaorjc commented 3 years ago

@Maxstu-zz is your setup able to build tensorflow itself? the errors seem to be in compiling llvm support which is required by MLIR in TF tree (we don't directly use it)...

maxstupp commented 3 years ago

I am able to build the normal tensorflow pip package from source, aswell as the monolithic version with bazel build --config=opt --config=monolithic tensorflow:libtensorflow_cc.so tensorflow:install_headers.

scottlegrand commented 3 years ago

Sadly, that just leads to another cascade of errors related (I think) to the Abseil library (attached)...

bazel_errors.txt

hawkinsp commented 3 years ago

@scottlegrand I'm wondering if you need to run, say, TensorFlow's ./configure script first in the source tree. Those errors aren't really to do with Abseil, e.g., the first error is from standard C++ things inside LLVM. That implies to me that something about your compiler toolchain isn't configured correctly, or bazel hasn't detected it correctly. Hence my guess about you needing to run the ./configure script.

XLA isn't that closely bonded to TensorFlow, it's in the same repository mostly for convenience. We could separate it, but it's not clear to me what that achieves beyond moving code around.

scottlegrand commented 3 years ago

@hawkinsp I can neither build tensorflow (After running ./configure) nor does wiping the bazel cache then running ./configure inside the tensorflow directory fix this.

But these sorts of (to me) arbitrary errors with bazel and Tensorflow are why I want this reduced to a separate library for the sake of building and deploying AMBER in the wild. This is too complex ATM IMO. I know we could work towards fixing this on my machine, but as likely 1 of only 3 people supporting this app, we just don't have the resources to let it be this complicated nor the funding to hire someone for that role.

bazel_errors2.txt

maxstupp commented 3 years ago

It would be awesome if there was a way to just build the whole xla part once with bazel into a library and then use it with any other compiler as an external package similiar to https://github.com/FloopCZ/tensorflow_cc. This would allow to write and train our networks in python with jax, then extract their hlo module and just call them in c++ for other projects (like molecular dynamic simulations) with just linking to the library including the hlo_module_loader, pjrt_client, etc.

zhangqiaorjc commented 3 years ago

@scottlegrand you could ask tensorflow for help for bazel issues.

@Maxstu-zz that's a useful enhancement, we welcome community contributions!