keras-team / keras-cv

Industry-strength Computer Vision workflows with Keras
1k stars 333 forks source link

Stable Diffusion inference on TPUs #1227

Open sayakpaul opened 1 year ago

sayakpaul commented 1 year ago

Colab Notebook:

I tried running inference with Stable Diffusion (v1) on (Colab) TPUs. It worked; however, a couple of questions:

To mitigate the second issue, @deep-diver and I created SavedModels to eliminate most of the non-TF ops from the pipeline. Details can be found in this notebook. The tokenizer, however, still is not fully TF. But I couldn't load the SavedModels in a TPU strategy scope, leads to:

ResourceExhaustedError                    Traceback (most recent call last)
[<ipython-input-16-b1735bf71429>](https://localhost:8080/#) in <module>
      3     for k in gcs_paths:
      4         predict_fns.append(
----> 5             load_saved_model(gcs_paths[k])
      6         )

11 frames
[/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/](https://localhost:8080/#) in shape(self)
   1261         # `_tensor_shape` is declared and defined in the definition of
   1262         # `EagerTensor`, in C.
-> 1263         self._tensor_shape = tensor_shape.TensorShape(self._shape_tuple())
   1264       except core._NotOkStatusException as e:
   1265         raise core._status_to_exception(e) from None

ResourceExhaustedError: Failed to allocate request for 7.03MiB (7372800B) on device ordinal 0
    Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors.

Ccing @LukeWood @ianstenbit.

tirthasheshpatel commented 7 months ago

I tried to run your notebook, @sayakpaul, and it just seems to be stuck on the generation step. Seems like Stable Diffision might be broken on TPU. Will try to take a look what's causing this issue.