nerdyrodent / VQGAN-CLIP

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
Other
2.61k stars 428 forks source link

Reducing Generation Time #135

Open SafaTinaztepe opened 2 years ago

SafaTinaztepe commented 2 years ago

This is a fantastic tool! I've been using it to generate several one-off images and I'm excited to expand upon it and contribute in any way I can.

It takes at least 20 minutes to generate a default 512x512 image, and maybe 15 minutes to generate a 256x256 image on an AWS P2.XL instance. I get around 3.26s/it. The result is similar on P2.8XL instance, with 8 available GPUs. This is quite a long runtime, especially with expensive AWS instances.

I want to be able to scale the image generation workload by either running multiple prompts simultaneously or pool the generation across multiple gpus to reduce the draw time to sub 5 minutes.

Parallel prompt generation is more straightforward to accomplish. You can keep a map of active jobs on the visible GPUs and a queue of prompts to distribute among them, spinning up a new generate.py process for each available GPU.

The hard part I have been trying to examine is the draw time. Is there a way to map a single generation job across multiple, distributed or local, GPUs? This question has been raised in issues before, such as #24 and #47, but I want to bring it back. The challenge is that image generation occurs over multiple iterations of the same base seeded image, essentially procedurally generates the next layer, so you can't neatly split the "dataset" like in other ML workloads. I am not as strong in ML engineering as I thought I was, so if someone can help me clarify what I am getting at or tell me where I am wrong in this explanation/approach, I would highly appreciate it and close the issue.

I am familiar with AWS infrastructure such as EC2, Sagemaker, Glue, Batch, so if you think it's an infrastructure or hardware bottleneck, I can research more in that direction.

TLDR: How do we scale up image generation? In what way can we cut down generation time from ~20 minutes to ~5 minutes?

Let me know what you think. Thank you.

jl-codes commented 2 years ago

🔥

vendablefall commented 2 years ago

Multi-GPU support would help us, I'm currently running this on GCP GPU and they come in 16GB GPU's but are easy to attach multiples..

joaodafonseca commented 2 years ago

any updates on multi GPU support?

aajinkya1203 commented 11 months ago

Anything on this yet?