invoke-ai / InvokeAI

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
23.65k stars 2.43k forks source link

feature request: flag to toggle CPU rendering #90

Closed RationalTangle closed 2 years ago

RationalTangle commented 2 years ago

Thanks for your work! Although GPU rendering is fast, most of us are severely limited by available VRAM. Would it be possible to add a --cpu flag to toggle CPU rendering?

warner-benjamin commented 2 years ago

The --device flag for dream.py allows you to set what PyTorch device Stable Diffusion runs on. For CPU you'd set --device cpu. Although the k_diffusion samplers might not respect it.

There are currently some issues running the model on CPU that need to be debugged. At least on the default PyTorch 1.11 that the env sets up. If you want to run it on CPU you'll need to work through and resolve these errors.

RationalTangle commented 2 years ago

Ah, I missed that, thanks. It sounds like getting CPU rendering to work is a more complicated problem than I anticipated.

Cubox commented 2 years ago
"LayerNormKernelImpl" not implemented for 'Half'
Are you sure your system has an adequate NVIDIA GPU?

Getting this, even when using ddim as a sampler

RationalTangle commented 2 years ago

A working CPU version of stable diffusion popped up: https://github.com/bes-dev/stable_diffusion.openvino Very bare-bones, only works with intel, and doesn't include klms at this time. However, I'm actually somewhat surprised by how fast it is on my old 32-core Xeon server. Could this approach be integrated into the lstein/stable-diffusion repo and activated with a --cpu flag?

magnusviri commented 2 years ago

Have you tried using --full_precision? I tried it and it got past the issue (using a GitHub runner of all things). We could hard code --full_precision into the cpu implementation. I'm not sure if all cpu's will require it.

Cubox commented 2 years ago
Traceback (most recent call last):
  File "/home/cubox/stable-diffusion-dream/ldm/simplet2i.py", line 382, in prompt2image
    image = make_image(x_T)
  File "/home/cubox/stable-diffusion-dream/ldm/simplet2i.py", line 478, in make_image
    samples, _ = sampler.sample(
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/cubox/stable-diffusion-dream/ldm/models/diffusion/ksampler.py", line 83, in sample
    K.sampling.__dict__[f'sample_{self.schedule}'](
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/cubox/stable-diffusion-dream/src/k-diffusion/k_diffusion/sampling.py", line 186, in sample_lms
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cubox/stable-diffusion-dream/ldm/models/diffusion/ksampler.py", line 16, in forward
    uncond, cond = self.inner_model(x_in, sigma_in, cond=cond_in).chunk(2)
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cubox/stable-diffusion-dream/src/k-diffusion/k_diffusion/external.py", line 100, in forward
    eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
  File "/home/cubox/stable-diffusion-dream/src/k-diffusion/k_diffusion/external.py", line 126, in get_eps
    return self.inner_model.apply_model(*args, **kwargs)
  File "/home/cubox/stable-diffusion-dream/ldm/models/diffusion/ddpm.py", line 1440, in apply_model
    x_recon = self.model(x_noisy, t, **cond)
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cubox/stable-diffusion-dream/ldm/models/diffusion/ddpm.py", line 2148, in forward
    out = self.diffusion_model(x, t, context=cc)
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cubox/stable-diffusion-dream/ldm/modules/diffusionmodules/openaimodel.py", line 806, in forward
    h = module(h, emb, context)
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cubox/stable-diffusion-dream/ldm/modules/diffusionmodules/openaimodel.py", line 88, in forward
    x = layer(x, context)
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cubox/stable-diffusion-dream/ldm/modules/attention.py", line 298, in forward
    x = block(x, context=context)
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cubox/stable-diffusion-dream/ldm/modules/attention.py", line 232, in forward
    return checkpoint(
  File "/home/cubox/stable-diffusion-dream/ldm/modules/diffusionmodules/util.py", line 155, in checkpoint
    return func(*inputs)
  File "/home/cubox/stable-diffusion-dream/ldm/modules/attention.py", line 238, in _forward
    x = self.attn1(self.norm1(x)) + x
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward
    return F.layer_norm(
  File "/home/cubox/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/functional.py", line 2486, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type BFloat16 but found Float

This is using --web -F --device cpu

Also, unrelated, but keep in mind that using Github actions to run SD might go against Github ToS, don't get banned :)

magnusviri commented 2 years ago

"Also, unrelated, but keep in mind that using Github actions to run SD might go against Github ToS, don't get banned :)"

You're the second person to bring this up. I didn't look very hard, but I can't find any TOS that are specific for GitHub Actions. It is a CI/CD service. And the purpose of the workflows I'm working on is to test the builds for errors before they are released. The GitHub runners are so slow compared to my own computer. But I can't lend a computer to always test. I've actually tried to figure out how we can get ahold of some self-hosted runners to test GPU's, because GitHub isn't offering those for free (and the Mac MPS runners aren't released yet).