Suggestion: include memory optimizations based on code from another fork

sarseev commented 2 years ago

https://github.com/basujindal/stable-diffusion has a version of SD with optimized memory usage, for some 8GB VRAM card owners (such as myself) this can mean being able to generate 512x512 images (this script, as well as native SD code, crash with "out of memory" error). If it is possible to implement these or similar optimizations within this script, it would be highly appreciated.

user55050 commented 2 years ago

Try it with half precision:

Add model.half() right after model = instantiate_from_config(config.model) and init_image = init_image.half() right after init_image = repeat(init_image, '1 ... -> b ...', b=batch_size).

Works on my 8GB GPU.

TDiffff commented 2 years ago

I can't get it to work for the chunk part in half mode @nickRJ Solved, I had --strength to 1.0 :^)

LuciferSam86 commented 2 years ago

I rewrite the message since I have this error:

Traceback (most recent call last):
  File ".\scripts\txt2imghd.py", line 551, in <module>
    main()
  File ".\scripts\txt2imghd.py", line 366, in main
    text2img2(opt)
  File ".\scripts\txt2imghd.py", line 490, in text2img2
    init_latent = model.get_first_stage_encoding(model.encode_first_stage(init_image))  # move to latent space
  File "C:\Users\LuciferSam\.conda\envs\ldm\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context    return func(*args, **kwargs)
  File "d:\stable_diffusion\stable-diffusion-main\ldm\models\diffusion\ddpm.py", line 863, in encode_first_stage
    return self.first_stage_model.encode(x)
  File "d:\stable_diffusion\stable-diffusion-main\ldm\models\autoencoder.py", line 325, in encode
    h = self.encoder(x)
  File "C:\Users\LuciferSam\.conda\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "d:\stable_diffusion\stable-diffusion-main\ldm\modules\diffusionmodules\model.py", line 439, in forward
    hs = [self.conv_in(x)]
  File "C:\Users\LuciferSam\.conda\envs\ldm\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\LuciferSam\.conda\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\Users\LuciferSam\.conda\envs\ldm\lib\site-packages\torch\nn\modules\conv.py", line 443, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

the command is: python .\scripts\txt2imghd.py --prompt "a photograph of an astronaut riding a horse" --strength=1.0 --ddim what am I missing?

blacklisteddev commented 2 years ago

Got the same error, it was the indentation before init_image = init_image.half() for me.

mbwgh commented 1 year ago

https://github.com/basujindal/stable-diffusion has a version of SD with optimized memory usage, for some 8GB VRAM card owners (such as myself) this can mean being able to generate 512x512 images (this script, as well as native SD code, crash with "out of memory" error). If it is possible to implement these or similar optimizations within this script, it would be highly appreciated.

The aforementioned fork allows generation of 512x512 images on 4GB vRAM cards, which should be the baseline to compare against imo.

jquesnelle / txt2imghd

Suggestion: include memory optimizations based on code from another fork #2