invoke-ai / InvokeAI

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
23.44k stars 2.41k forks source link

IMG2IMG CUDA out of memory but prompt not? #125

Closed Evilu closed 2 years ago

Evilu commented 2 years ago

Ok, so iv'e been working with this fork fine in the last few days. prompt with default s50 -b1 -W512 -H512 -C7.5 -mk_lms working just fine with my 6gb dedicated GPU until now, but IMG2IMG isn't.

dream> "Regular Frog" --init_img=./init-images/pepe.jpg --strength=0.5 -s100 -n4 Sampling: 0%| | 0/4 [00:00<?, ?it/s]sampler 'k_lms' is not yet supported. Using DDM sampler loaded input image of size (976, 850) from ./init-images/pepe.jpg Sampling: 0%| | 0/4 [00:00<?, ?it/s] CUDA out of memory. Tried to allocate 390.00 MiB (GPU 0; 6.00 GiB total capacity; 3.17 GiB already allocated; 0 bytes free; 4.72 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Are you sure your system has an adequate NVIDIA GPU? 0 images generated in 0.10s Outputs:

Any ideas why? doIMG2IMG just needs more GPU juice?

EDIT: btw, iv'e also tried really tiny render, dream> "Regular Frog" --init_img=./init-images/pepe.jpg -W5 -H5

dream returns the same message

Evilu commented 2 years ago

Just tried basujindal optimized img2img and it worked with --n_samples.

So a) we really want to implement it here. b) we need to check why img2img is much more expensive then text2img.

TingTingin commented 2 years ago

whats the size of the image itself?

lstein commented 2 years ago

This is a bit concerning. There is active working going on now with respect to the effect on reducing memory utilisation if you clear the CUDA cache frequently, and I have my fingers crossed that this will reduce memory requirements further. You might want to try checking out pull request #122 to see if there is an improvement:

In the stable-diffusion director:

git checkout -b BaristaLabs-clear-cuda-cache-after-each-image main
git pull https://github.com/BaristaLabs/stable-diffusion-dream.git clear-cuda-cache-after-each-image
conda env update -f environment.yaml   # just in case

After testing, you can switch back to main branch with "git checkout main"

Oceanswave commented 2 years ago

I believe this is a regression, large images are being read and using those dimensions, it attempts to generate images. There should be a warning or a constraint on the -I if the input image is above a certain size - additionally, it doesn't appear that specifying -W/-H are passed

tildebyte commented 2 years ago

~sampler 'k_lms' is not yet supported. Using DDM sampler from your output indicates that you're using a (relatively :D) old release.~ ~You should probably update to at least 1.09~

ugh derp

bmaltais commented 2 years ago

Possibly adding an option to resize large image as input for img2img... maybe use the -H and -W as the resize values? Resize to those when specified? This will take care of using an fpgan input images with a small GPU that can only handle 512x512

Oceanswave commented 2 years ago

@tildebyte k_lms doesn't support img2img, so that warning is expected even in latest. The consensus is that since the sampler has less of a role in img2img to some extent, it's less of a priority

@mbaltais yeah, exactly

Evilu commented 2 years ago

For those who asked, image is 976x850

bmaltais commented 2 years ago

650 is not divisible by 64... Could be part of your problem

lstein commented 2 years ago

I just completed an extensive benchmarking of VRAM usage and think that things will be better now that pull request #162 is in. Note that there is still a bug when applying face touchup and upscaling to images generated using the batch option (e.g. -b2): only the last image in the batch is touched up. However, most people won't have enough VRAM to run more than 1 image per batch, so I thought it would be OK to release.