Closed RationalTangle closed 2 years ago
The --device flag for dream.py allows you to set what PyTorch device Stable Diffusion runs on. For CPU you'd set --device cpu
. Although the k_diffusion samplers might not respect it.
There are currently some issues running the model on CPU that need to be debugged. At least on the default PyTorch 1.11 that the env sets up. If you want to run it on CPU you'll need to work through and resolve these errors.
Ah, I missed that, thanks. It sounds like getting CPU rendering to work is a more complicated problem than I anticipated.
"LayerNormKernelImpl" not implemented for 'Half'
Are you sure your system has an adequate NVIDIA GPU?
Getting this, even when using ddim as a sampler
A working CPU version of stable diffusion popped up: https://github.com/bes-dev/stable_diffusion.openvino Very bare-bones, only works with intel, and doesn't include klms at this time. However, I'm actually somewhat surprised by how fast it is on my old 32-core Xeon server. Could this approach be integrated into the lstein/stable-diffusion repo and activated with a --cpu flag?
Have you tried using --full_precision? I tried it and it got past the issue (using a GitHub runner of all things). We could hard code --full_precision into the cpu implementation. I'm not sure if all cpu's will require it.
Traceback (most recent call last):
File "/home/cubox/stable-diffusion-dream/ldm/simplet2i.py", line 382, in prompt2image
image = make_image(x_T)
File "/home/cubox/stable-diffusion-dream/ldm/simplet2i.py", line 478, in make_image
samples, _ = sampler.sample(
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/cubox/stable-diffusion-dream/ldm/models/diffusion/ksampler.py", line 83, in sample
K.sampling.__dict__[f'sample_{self.schedule}'](
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/cubox/stable-diffusion-dream/src/k-diffusion/k_diffusion/sampling.py", line 186, in sample_lms
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cubox/stable-diffusion-dream/ldm/models/diffusion/ksampler.py", line 16, in forward
uncond, cond = self.inner_model(x_in, sigma_in, cond=cond_in).chunk(2)
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cubox/stable-diffusion-dream/src/k-diffusion/k_diffusion/external.py", line 100, in forward
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
File "/home/cubox/stable-diffusion-dream/src/k-diffusion/k_diffusion/external.py", line 126, in get_eps
return self.inner_model.apply_model(*args, **kwargs)
File "/home/cubox/stable-diffusion-dream/ldm/models/diffusion/ddpm.py", line 1440, in apply_model
x_recon = self.model(x_noisy, t, **cond)
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cubox/stable-diffusion-dream/ldm/models/diffusion/ddpm.py", line 2148, in forward
out = self.diffusion_model(x, t, context=cc)
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cubox/stable-diffusion-dream/ldm/modules/diffusionmodules/openaimodel.py", line 806, in forward
h = module(h, emb, context)
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cubox/stable-diffusion-dream/ldm/modules/diffusionmodules/openaimodel.py", line 88, in forward
x = layer(x, context)
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cubox/stable-diffusion-dream/ldm/modules/attention.py", line 298, in forward
x = block(x, context=context)
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cubox/stable-diffusion-dream/ldm/modules/attention.py", line 232, in forward
return checkpoint(
File "/home/cubox/stable-diffusion-dream/ldm/modules/diffusionmodules/util.py", line 155, in checkpoint
return func(*inputs)
File "/home/cubox/stable-diffusion-dream/ldm/modules/attention.py", line 238, in _forward
x = self.attn1(self.norm1(x)) + x
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward
return F.layer_norm(
File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/functional.py", line 2486, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type BFloat16 but found Float
This is using --web -F --device cpu
Also, unrelated, but keep in mind that using Github actions to run SD might go against Github ToS, don't get banned :)
"Also, unrelated, but keep in mind that using Github actions to run SD might go against Github ToS, don't get banned :)"
You're the second person to bring this up. I didn't look very hard, but I can't find any TOS that are specific for GitHub Actions. It is a CI/CD service. And the purpose of the workflows I'm working on is to test the builds for errors before they are released. The GitHub runners are so slow compared to my own computer. But I can't lend a computer to always test. I've actually tried to figure out how we can get ahold of some self-hosted runners to test GPU's, because GitHub isn't offering those for free (and the Mac MPS runners aren't released yet).
Thanks for your work! Although GPU rendering is fast, most of us are severely limited by available VRAM. Would it be possible to add a --cpu flag to toggle CPU rendering?