tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
5.24k stars 336 forks source link

Can you run ip-adapter sdxl using Colab's free tier? #14

Open putuoka opened 1 year ago

putuoka commented 1 year ago

I have tried running it but always run into memory issues and it terminates at this part - IPAdapterXL(pipe, image_encoder_path, ip_ckpt, device)

Even though loading the model with StableDiffusionXLPipeline.from_pretrained works fine. I also tried using accelerate but still face issues. Does this mean it's not possible to run it on Colab's free tier?

xiaohu2015 commented 1 year ago

@putuoka hi, do you also test the Colab demo of SD1.5? As SDXL is larger and we also use a larger CLIP model (ViT-bigG), it needs more resources.

putuoka commented 1 year ago

@putuoka hi, do you also test the Colab demo of SD1.5? As SDXL is larger and we also use a larger CLIP model (ViT-bigG), it needs more resources.

I have tried that all of your Colab demos do not work well. They need additional code provided by @Lucascoolsouza in this Github post: https://github.com/tencent-ailab/IP-Adapter/issues/11

I have created a working Colab notebook for Stable Diffusion 1.5 on the free tier based on those fixes: https://colab.research.google.com/drive/1JmLhgfq7EYDjytBGQCg5o1F7NLSRk9Dn

just need to click runtime > run all

This Colab notebook runs Stable Diffusion 1.5 smoothly, but I am still facing memory limitations when trying to run the much larger SDXL model on Colab's free resources.

xiaohu2015 commented 1 year ago

@putuoka thinks a lot, can you give a pr to fix the Colab demo?

Lucascoolsouza commented 1 year ago

@putuoka hi, do you also test the Colab demo of SD1.5? As SDXL is larger and we also use a larger CLIP model (ViT-bigG), it needs more resources.

I have tried that all of your Colab demos do not work well. They need additional code provided by @Lucascoolsouza in this Github post: #11

I have created a working Colab notebook for Stable Diffusion 1.5 on the free tier based on those fixes: https://colab.research.google.com/drive/1JmLhgfq7EYDjytBGQCg5o1F7NLSRk9Dn

just need to click runtime > run all

This Colab notebook runs Stable Diffusion 1.5 smoothly, but I am still facing memory limitations when trying to run the much larger SDXL model on Colab's free resources.

the limitation of not running the sdxl kinda happens because this is not running with xformers.

putuoka commented 1 year ago

@putuoka hi, do you also test the Colab demo of SD1.5? As SDXL is larger and we also use a larger CLIP model (ViT-bigG), it needs more resources.

I have tried that all of your Colab demos do not work well. They need additional code provided by @Lucascoolsouza in this Github post: #11 I have created a working Colab notebook for Stable Diffusion 1.5 on the free tier based on those fixes: https://colab.research.google.com/drive/1JmLhgfq7EYDjytBGQCg5o1F7NLSRk9Dn just need to click runtime > run all This Colab notebook runs Stable Diffusion 1.5 smoothly, but I am still facing memory limitations when trying to run the much larger SDXL model on Colab's free resources.

the limitation of not running the sdxl kinda happens because this is not running with xformers.

I attempted to enable memory efficient attention in IP Adapter using Xformers as follows:

pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()

However, this did not reduce the memory usage as expected. It appears modifications directly to the ip_adapter.py Python code may be needed to implement Xformers optimization properly when running in Google Colab.

I also tried modifying the Python code even though I didn't really know what I was doing LOL. I added enable_xformers_memory_efficient_attention() in all the places with the word "device" LOL

update: i found the memory issues on

self.image_encoder = CLIPVisionModelWithProjection.from_pretrained(self.image_encoder_path).to(

xiaohu2015 commented 1 year ago

@putuoka hi, the CLIP-ViT-G has 1.8B parameters, so it needs large GPU memory.

xiaohu2015 commented 1 year ago

@putuoka hi, we have updated a new version of SDXL, which uses CLIP-ViT-H, hence it uses less GPU memory