Closed sebastianlutter closed 2 years ago
We might be compiling the CUDA ops with a different set of targets than the python binaries to reduce our build times due to a lack of resources. That would cause it to recompile the ops for your GPU on startup.
What GPU, CUDA version & driver version are you using?
CUDA is not involved, I'm running only on CPU (Intel 12th generation i7). There are no nvidia/cuda driver or libs installed at all.
How are the builds done? Using https://github.com/tensorflow/java/blob/master/deploy.sh? (in docker like shown in release.sh)
EDIT: Found https://github.com/tensorflow/java/blob/master/CONTRIBUTING.md#building in the meantime
Ok. Our XLA support is a bit iffy due to some open bugs in the upstream TF XLA support, but I'm not an expert on the consequences thereof.
Can you please provide a settings.xml
file with the basic mvn settings used to build the jar like 0.5.0-SNAPSHOT? Would be an good starting point for building a customized jar.
It should build out of the box in Maven terms, but getting bazel configured to compile TF correctly is always a pain.
I build a minimal code example in Python and Java and found out that I was wrong:
phoenix@dev:~/workspaces/research/tf_tests$ ./run.sh 2> /dev/null | grep "warm up"
Python: warm up finished in 29.75354215799598 seconds
Java: Model without ConfigProto, warm up took 29.337 seconds
Java: Model with ConfigProto/disabled JIT, warm up took 29.55 seconds
Tensorflow 2.7.1 needs about 30 seconds of XLA compiling in Java and in Python. In Tensorflow 2.9.1 (Python) the problem does not exist. Closing this issue, @Craigacp thanks for your help!
After loading a Vision Transformer model (vit_b32_fe) from tfhub with 0.4.1 version of tensorflow-java the first query takes about 30 seconds to start. Reason is the graph is pre-compiled using XLA.
This is a performance issue I want to solve. Using 0.5.0-SNAPSHOT is not possible because it fails to load the SavedModel (see https://github.com/tensorflow/java/issues/472). Since I can load and run the model without issues using python tensorflow 2.7.1 or 2.9.0 I wonder why I get this issue in Java.
Are there any options (disable XLA jit or alike) that may help to avoid that the model is blocking for 30 seconds. Thanks for any help.
System information
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04 x86_64):
TensorFlow installed from (source or binary):
TensorFlow version (use command below):
Java version (i.e., the output of
java -version
):Java command line flags (e.g., GC parameters):
Python version (if transferring a model trained in Python):
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version:
GPU model and memory:
Describe the current behavior
Describe the expected behavior
Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.
Other info / logs