Enable ConcreteFunction XLA JIT

rnett commented 3 years ago

This is mostly a reminder for myself, but also documentation if someone else ends up doing it.

After reading through the python code (see here), I found that to force function JITing you need to set the _XlaMustCompile and _noinline attributes to true, as is done here.

However, this results in an error. I didn't run it down at the time, but it was reported in https://github.com/tensorflow/tensorflow/issues/50458. Once that is fixed we should try this again.

rnett commented 3 years ago

An extension: tensorflow has mechanisms to automatically JIT sections of functions. See https://www.tensorflow.org/xla#auto-clustering. Look into how to enable this, and document if the Python methods don't work. I'm not seeing it in python code, so the given methods most likely work. There is a session specific version as well (that is overridden by the env variable): https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/protobuf/config.proto#L223

rnett commented 3 years ago

There are option setters here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/c_api_experimental.h#L62, but we aren't currently mapping it.

rnett commented 3 years ago

maziyarpanahi commented 1 year ago

Hi @rnett

I am hitting this issue coming from here: https://github.com/tensorflow/java/blob/9cfea866973cc6c05ba8cb9cb11c023124a5c28d/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/ConcreteFunction.java#L464

However, I am not seeing it on CPUs, but only when I use TensorFlow (0.4.0) on a GPU.

The warning:

2022-09-10 21:25:11.745187: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:248 : NOT_FOUND: could not find registered platform with id: 0x7f0297ff8f14

The full error:

2022-09-10 21:24:53.906018: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/export_wav2vec2-base
2022-09-10 21:24:54.042842: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve }
2022-09-10 21:24:54.042908: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: /tmp/export_wav2vec2-base
2022-09-10 21:24:54.043018: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-09-10 21:24:59.210550: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9758 MB memory:  -> device: 0, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:04:00.0, compute capability: 6.0
2022-09-10 21:24:59.213219: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 9758 MB memory:  -> device: 1, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:05:00.0, compute capability: 6.0
2022-09-10 21:24:59.215647: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 9758 MB memory:  -> device: 2, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:06:00.0, compute capability: 6.0
2022-09-10 21:24:59.218087: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 9758 MB memory:  -> device: 3, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-09-10 21:24:59.784080: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:228] Restoring SavedModel bundle.
2022-09-10 21:25:01.172674: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:212] Running initialization op on SavedModel bundle at path: /tmp/export_wav2vec2-base
2022-09-10 21:25:01.807065: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:301] SavedModel load for tags { serve }; Status: success: OK. Took 7901067 microseconds.
2022-09-10 21:25:02.839312: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9758 MB memory:  -> device: 0, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:04:00.0, compute capability: 6.0
2022-09-10 21:25:02.840745: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 9758 MB memory:  -> device: 1, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:05:00.0, compute capability: 6.0
2022-09-10 21:25:02.842146: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 9758 MB memory:  -> device: 2, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:06:00.0, compute capability: 6.0
2022-09-10 21:25:02.843535: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 9758 MB memory:  -> device: 3, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-09-10 21:25:06.693928: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9758 MB memory:  -> device: 0, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:04:00.0, compute capability: 6.0
2022-09-10 21:25:06.694915: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 9758 MB memory:  -> device: 1, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:05:00.0, compute capability: 6.0
2022-09-10 21:25:06.695728: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 9758 MB memory:  -> device: 2, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:06:00.0, compute capability: 6.0
2022-09-10 21:25:06.696504: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 9758 MB memory:  -> device: 3, name: NVIDIA Tesla P100-PCIE-12GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-09-10 21:25:10.978737: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8401
2022-09-10 21:25:11.745187: W external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at xla_ops.cc:248 : NOT_FOUND: could not find registered platform with id: 0x7f0297ff8f14
[error] org.tensorflow.exceptions.TensorFlowException: 2 root error(s) found.
[error]   (0) NOT_FOUND: could not find registered platform with id: 0x7f0297ff8f14
[error]          [[{{function_node __inference_serving1_9887}}{{node wav2vec2/encoder/pos_conv_embed/conv/PartitionedCall}}]]
[error]          [[StatefulPartitionedCall/_847]]
[error]   (1) NOT_FOUND: could not find registered platform with id: 0x7f0297ff8f14
[error]          [[{{function_node __inference_serving1_9887}}{{node wav2vec2/encoder/pos_conv_embed/conv/PartitionedCall}}]]
[error] 0 successful operations.
[error] 0 derived errors ignored.
[error]         at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:101)
[error]         at org.tensorflow.Session.run(Session.java:850)
[error]         at org.tensorflow.Session.access$300(Session.java:82)
[error]         at org.tensorflow.Session$Runner.runHelper(Session.java:552)
[error]         at org.tensorflow.Session$Runner.runNoInit(Session.java:499)
[error]         at org.tensorflow.Session$Runner.run(Session.java:495)
[error]         at Main$.delayedEndpoint$Main$1(Main.scala:85)
[error]         at Main$delayedInit$body.apply(Main.scala:12)
[error]         at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error]         at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error]         at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error]         at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error]         at scala.collection.immutable.List.foreach(List.scala:431)
[error]         at scala.App.main(App.scala:80)
[error]         at scala.App.main$(App.scala:78)
[error]         at Main$.main(Main.scala:12)
[error]         at Main.main(Main.scala)
[error]         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error]         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]         at java.lang.reflect.Method.invoke(Method.java:498)

Is there a workaround to disable (or enable) the XLA via TF Session in Java or somehow workaround this? (it made the model not usable on the GPU)

PS: thanks for documenting this in the code and here.

rnett commented 1 year ago

Hi @maziyarpanahi, how are you using XLA? That function is private because it does not work (because of this issue), and afaik there's intentionally no other way of using XLA with TF/java because of this.

rnett commented 1 year ago

For any workarounds or fixes to enable you to use XLA on GPU from Java, you'd want to look at https://github.com/tensorflow/tensorflow/issues/50458

maziyarpanahi commented 1 year ago

Hi @rnett

Thanks for your quick response. I am not actually using XLA in either Python or Java, I think the model I am exporting into SavedModel is (It's Wav2Vec2). I saw the error is very similar to the comments in the source code and the description so I just assumed the model itself might have been using XLA.

If I simply load and do inference with this model on CPUs is fine, but the moment I use GPU it fails with that error:

val model = SavedModelBundle.load(folder, "serve")
// it fails on GPU only during prediction

There is a sample code here (in build.sbt you can select the dependency for CPU or GPU) just in case: https://github.com/maziyarpanahi/wav2vec-tensorflow

rnett commented 1 year ago

Ah, that sounds right. Unfortunately there is nothing we can do on our end for this, it's due to the Tensorflow bug I linked above. You could try re-writing the proto but I'm not sure how exactly you would go about that.

austinzh commented 6 months ago

@rnett Would it possible to enable it by setting environment variables in global scope?

rnett commented 6 months ago

No idea, I haven't been working on this project in quite some time.

On Fri, Mar 1, 2024, 8:37 AM austinzh @.***> wrote:

@rnett https://github.com/rnett Would it possible to enable it by setting environment variables in global scope?

— Reply to this email directly, view it on GitHub https://github.com/tensorflow/java/issues/347#issuecomment-1973507796, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHWJKKIE4QZS7R36OG4QEUTYWCVDNAVCNFSM47TI56DKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJXGM2TANZXHE3A . You are receiving this because you were mentioned.Message ID: @.***>

tensorflow / java

Enable ConcreteFunction XLA JIT #347