Open akiou opened 3 years ago
I think we probably can't have the TF native library loaded twice into the same process. TF-Java and TF-Python will by necessity use slightly different builds of the native library (as it's compiled and released separately). What's your use case for having both of them loaded in the same process?
Can't we have the TF native library loaded twice into the same process even if we use the same TF version like java 0.2.0 and python 2.3.1 described here?
About the use case:
They are compiled separately and potentially with different options (e.g. MKL, GPUs, x86_64 features), leading to a conflict. It might be possible to persuade TF-Java to only load the JNI binding and not libtensorflow, but if you did hit native library issues we wouldn't support such a configuration. @saudet are there some flags on the JavaCPP loader which will enable this?
With 0.2.0 and the upcoming 0.3.0 release you should be able to train models in TF-Java (though not all operations have gradients available yet as they aren't all present in the native layer).
TF Core keeps a global state. The only way to get this working is by compiling a version of TF Core that supports both Java and Python APIs. I did that for TF 1.x and it works, but we would need to port this to TF 2.x and then we could maintain this here: https://groups.google.com/a/tensorflow.org/g/jvm/c/T964efemgek/m/OUe0uxV6DAAJ
Thank you for your replies and the reference link!
@saudet You mean that you will compile a version of TF Core that supports both Java and Python APIs for TF 2.x in the near future, right? According to the bytedeco example pom file in the google group thread, the TF java artifact for TF 1.x that works in the same process simultaneously is available. But the java artifact for TF 2.x is not available now, right?
I don't plan to do it myself, no, because we have no build machines that have enough power to complete the Python build anyway. If you have questions though, feel free to ask and I will help!
Then, are there any ways to build the java artifact for TF 2.x? I need the artifact that works in the same python process simultaneously
@akiou, fwiw, I'm having a hybrid setup where I train in Python but do all the pre/post processing and inference in Java, and I did not had the kind of issues you are mentioning here,
The way I do it is when the Python script starts for training, I use JPype to launch the JVM and load the Java library doing all of the stuff I've mentioned before, and invoke the Java classes directly from Python.
It really worked great in my case, if you are interested I can share with you more details to help you set it up that way.
@karllessard Perhaps, you might have made a misunderstanding.
Also in my case, I did not get an exception just executing a model in TF java invoked from a Python process. The problem described in this issue can be reproduced when the TF java model execution from a Python process after the same Python process imports TF 2.x. I need both of the TF python 2.x and TF java.
Then, are there any ways to build the java artifact for TF 2.x? I need the artifact that works in the same python process simultaneously
You will need to perform the build as per this script when EXTENSION=-python https://github.com/bytedeco/javacpp-presets/blob/master/tensorflow/cppbuild.sh
@saudet I'd like to confirm one thing about the build with extension=-python. The build enables us to run a python script from a Java process like https://github.com/bytedeco/javacpp-presets/blob/master/tensorflow/samples/KerasMNIST.java#L32-L49, right? If so, I think the javacpp build with extension=-python cannot realize what I want to do.
What I want to do is not invoke TF python 2.x from java but invoke TF java from a python process that already imports TF 2.x. This is because I'd like the usecase described in https://github.com/tensorflow/java/issues/226#issuecomment-786746188.
Sure, that's possible too. You'll need a way to use JNI from Python, but that's easily doable with tools like jpype, pyjnius, etc.
Thank you!
I cloned the repository javacpp-presets
, checkouted tag 1.5.4
and executed the following command:
$ mvn clean install --projects .,tensorflow -Djavacpp.platform.extension=-python
then, some jars are generated intensorflow/target/
directory. So I should use the generated jars instead of tensorflow/java, right? But the jars do not include classes included in tensorflow-java such as org.tensorflow.SavedModelBundle
, so I suppose that the replacement would occur a compilation error.
Additionally, I tried to replace the tensorflow version with 2.3.1
in pom.xml
and cppbuild.sh
, but the maven command was failed. I think I need to specify the tensorflow version to 2.3.1
, so how can I specify the tensorflow version?
Thank you! I cloned the repository
javacpp-presets
, checkouted tag1.5.4
and executed the following command:$ mvn clean install --projects .,tensorflow -Djavacpp.platform.extension=-python
then, some jars are generated in
tensorflow/target/
directory. So I should use the generated jars instead of tensorflow/java, right? But the jars do not include classes included in tensorflow-java such asorg.tensorflow.SavedModelBundle
, so I suppose that the replacement would occur a compilation error.
Yes, the same class is available in that JAR file: http://bytedeco.org/javacpp-presets/tensorflow/apidocs/org/tensorflow/SavedModelBundle.html
Additionally, I tried to replace the tensorflow version with
2.3.1
inpom.xml
andcppbuild.sh
, but the maven command was failed. I think I need to specify the tensorflow version to2.3.1
, so how can I specify the tensorflow version?
That's what I keep telling you: Someone will need to work on updating that for TF 2.x.
That's what I keep telling you: Someone will need to work on updating that for TF 2.x.
I see, then the jar for TF 2.x that can be invoked from a python process already importing TF 2.x is not available until someone works on updating that for TF2.x if my understanding is correct.
Additionally, I have another question (sorry for a lot of questions)
Yes, the same class is available in that JAR file: http://bytedeco.org/javacpp-presets/tensorflow/apidocs/org/tensorflow/SavedModelBundle.html
I tried to insert a statement SavedModelBundle savedModel = SavedModelBundle.load(...);
just after this line, but the following error was occurred. Can I really use SavedModelBundle by the same usage as the apidocs? I used the pom.xml as it is, and I specified the model path that is trained by TF 1.x.
java.lang.UnsatisfiedLinkError: org.tensorflow.SavedModelBundle.load(Ljava/lang/String;[Ljava/lang/String;[B[B)Lorg/tensorflow/SavedModelBundle;
at org.tensorflow.SavedModelBundle.load (Native Method)
at org.tensorflow.SavedModelBundle.access$000 (SavedModelBundle.java:27)
at org.tensorflow.SavedModelBundle$Loader.load (SavedModelBundle.java:32)
at org.tensorflow.SavedModelBundle.load (SavedModelBundle.java:95)
at KerasMNIST.main (KerasMNIST.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
at java.lang.Thread.run (Thread.java:748)
Make sure that Loader.load(tensorflow.class)
has been called:
https://github.com/bytedeco/javacpp-presets/tree/1.5.4/tensorflow#documentation
I was able to load the model by using SavedModelBundle.load()
! Thank you!
But of course, the model trained by TF 2.x still cannot be executed in java. So I'm waiting that someone from the TensorFlow team takes a look into this issue and builds java artifact for TF 2.x that can be invoked from a python process already importing tensorflow.
@akiou Since building TF Core is quite a challenge in itself, I've been experimenting with linking and loading the _pywrap_tensorflow_internal.so
file that comes with the binary distributions of TensorFlow on PyPI, instead of libtensorflow_cc.so.2
that gets built by default for the target of the C++ API, and it works! The tests pass and everything. If you are comfortable hacking with shared libraries, please feel free to do that.
@karllessard @Craigacp That would also be one way to circumvent our build issues. It would add a dependency on CPython, but it works, and it would make it possible to use both the Java and Python APIs in the same process. (To be clear, we don't need to use Python, we just need to link with the native CPython library to satisfy the undefined symbols.)
I have meet the same issue. Any plan to upgrade TF 2.x so it can be called by Python and Java in the same process?
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
Describe the current behavior
I executed Python TensorFlow v2.3.1 and TensorFlow java invoked from JNI by using
pyjnius
in a Python process. Then, the following error was shown and the Python process was aborted.More details:
.pb
file.tensorflow
andpyjnius
.pyjnius
which loads the stored model by usingSavedModelBundle.load()
.Describe the expected behavior
The user should be able to execute both (python and java) from a Python process without any conflict exceptions.
Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.
I pushed a sample code into https://github.com/akiou/tf_conflict. You can reproduce this error by using the repo sample code.
Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
The error message is dependeing on the platform OS. The error message in this issue is observed inside a Linux docker container. If you run the sample code in Mac OS, then the following error message is shown instead: