Yet another PyTorch implementation of Stable Diffusion.
I tried my best to make the codebase minimal, self-contained, consistent, hackable, and easy to read. Features are pruned if not needed in Stable Diffusion (e.g. Attention mask at CLIP tokenizer/encoder). Configs are hard-coded (based on Stable Diffusion v1.x). Loops are unrolled when that shape makes more sense.
Despite of my efforts, I feel like I cooked another sphagetti. Well, help yourself!
Heavily referred to following repositories. Big kudos to them!
pip install torch numpy Pillow regex
or pip install -r requirements.txt
.data.v20221029.tar
from here and unpack in the parent folder of stable_diffusion_pytorch
. Your folders should be like this:
stable-diffusion-pytorch(-main)/
├─ data/
│ ├─ ckpt/
│ ├─ ...
├─ stable_diffusion_pytorch/
│ ├─ samplers/
└ ┴─ ...
Note that checkpoint files included in data.zip
have different license -- you should agree to the license to use checkpoint files.
Import stable_diffusion_pytorch
as submodule.
Here's some example scripts. You can also read the docstring of stable_diffusion_pytorch.pipeline.generate
.
Text-to-image generation:
from stable_diffusion_pytorch import pipeline
prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts)
images[0].save('output.jpg')
...with multiple prompts:
prompts = [
"a photograph of an astronaut riding a horse",
""]
images = pipeline.generate(prompts)
...with unconditional(negative) prompts:
prompts = ["a photograph of an astronaut riding a horse"]
uncond_prompts = ["low quality"]
images = pipeline.generate(prompts, uncond_prompts)
...with seed:
prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, uncond_prompts, seed=42)
Preload models (you will need enough VRAM):
from stable_diffusion_pytorch import model_loader
models = model_loader.preload_models('cuda')
prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, models=models)
If you get OOM with above code but have enough RAM (not VRAM), you can move models to GPU when needed and move back to CPU when not needed:
from stable_diffusion_pytorch import model_loader
models = model_loader.preload_models('cpu')
prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, models=models, device='cuda', idle_device='cpu')
Image-to-image generation:
from PIL import Image
prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('space.jpg')]
images = pipeline.generate(prompts, input_images=images)
...with custom strength:
prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('space.jpg')]
images = pipeline.generate(prompts, input_images=images, strength=0.6)
Change classifier-free guidance scale:
prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, cfg_scale=11)
...or disable classifier-free guidance:
prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, do_cfg=False)
Reduce steps (faster generation, lower quality):
prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, n_inference_steps=28)
Use different sampler:
prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, sampler="k_euler")
# "k_lms" (default), "k_euler", or "k_euler_ancestral" is available
Generate image with custom size:
prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, height=512, width=768)
All codes on this repository are licensed with MIT License. Please see LICENSE file.
Note that checkpoint files of Stable Diffusion are licensed with CreativeML Open RAIL-M License. It has use-based restriction caluse, so you'd better read it.