tensorflow / java

Java bindings for TensorFlow
Apache License 2.0
813 stars 200 forks source link

Failed to load vit_b32_fe model from tfhub with 0.5.0-SNAPSHOT #472

Closed sebastianlutter closed 2 years ago

sebastianlutter commented 2 years ago

System information

Describe the current behavior Downloaded model from https://tfhub.dev/sayakpaul/vit_b32_fe/1, unpack, try to load it with SavedModelBundle. This worked well in version 0.4.1 (TF 2.7.1) but fails with 0.5.0-SNAPSHOT with the following error:

2022-09-14 11:57:19.299478: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /tmp/pxl_7644828079017596129
2022-09-14 11:57:19.345905: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2022-09-14 11:57:19.406820: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-09-14 11:57:19.419386: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2496000000 Hz
2022-09-14 11:57:19.419758: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f97d9e32ac0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-09-14 11:57:19.419773: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-09-14 11:57:19.419901: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: Kann die Shared-Object-Datei nicht öffnen: Datei oder Verzeichnis nicht gefunden
2022-09-14 11:57:19.419907: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2022-09-14 11:57:19.419918: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (T14-sebastian): /proc/driver/nvidia/version does not exist
2022-09-14 11:57:19.604621: I tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
2022-09-14 11:57:20.059396: I tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle at path: /tmp/pxl_7644828079017596129
2022-09-14 11:57:20.211332: I tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 911856 microseconds.
org.tensorflow.TensorFlowException: Op type not registered 'XlaConvV2' in binary running on T14-sebastian. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
    at org.tensorflow.SavedModelBundle.load(Native Method)
    at org.tensorflow.SavedModelBundle.access$000(SavedModelBundle.java:27)
    at org.tensorflow.SavedModelBundle$Loader.load(SavedModelBundle.java:32)
    at de.pixolution.process.module.tf2.SavedModelEmbeddings.<init>(SavedModelEmbeddings.java:105)
    at de.pixolution.process.module.tf2.SavedModelEmbeddings$ResourceHolder.<clinit>(SavedModelEmbeddings.java:71)
    at de.pixolution.process.module.tf2.SavedModelEmbeddings.getInstance(SavedModelEmbeddings.java:80)
    at de.pixolution.process.module.tf2.VitB32Module.createDescriptorEntities(VitB32Module.java:63)
    at de.pixolution.process.module.tf2.VitB32Module.createDescriptorEntities(VitB32Module.java:1)
    at de.pixolution.process.module.types.DescriptorModule.createOutputs(DescriptorModule.java:298)
    at de.pixolution.api.service.APIService.createOutputs(APIService.java:254)
    at de.pixolution.api.ServiceOutputProducer$1.call(ServiceOutputProducer.java:93)
    at de.pixolution.api.ServiceOutputProducer$1.call(ServiceOutputProducer.java:1)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)

Describe the expected behavior

Model should be loaded like it does with 0.4.1:

2022-09-14 12:14:44.843292: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pxl_14542487053289680376
2022-09-14 12:14:44.888522: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve }
2022-09-14 12:14:44.888626: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: /tmp/pxl_14542487053289680376
2022-09-14 12:14:44.888682: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-14 12:14:45.050174: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle.
2022-09-14 12:14:45.593306: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: /tmp/pxl_14542487053289680376
2022-09-14 12:14:45.877830: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 1034540 microseconds.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

SavedModelBundle savedModel = SavedModelBundle.loader("/path/to/savedModel/")
                                   .withTags(new String[]{"serve"}).load();

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

sebastianlutter commented 2 years ago

While building a minimal example I found that 0.5.0-SNAPSHOT works well with vit_b32_fe model in my minimal example, see

The reason for the issue was a classpath issue in my gradle multiproject. A subproject (lets call it foo) used the 0.5.0-SNAPSHOT jar from snapshot repository, and another subproject used the foo project as dependency, but missed the 0.5.0-SNAPSHOT maven repository declaration. Since I also had a Tensorflow 1.15 (old java project included in tensorflow repo) in another subproject I ended up with the exception above.

Long story short, 0.5.0-SNAPSHOT works well, my project setup was messed up. Closing this issue now.