lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 327 forks source link

Runtime Error #71

Open 1sacred1 opened 3 years ago

1sacred1 commented 3 years ago

I'm running into an issue when using "imagine" in my prompt. This is my first time using something like python so bear with me.

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 8.00 GiB total capacity; 5.98 GiB already allocated; 100.56 MiB free; 5.99 GiB reserved in total by PyTorch)

I'm not really understanding what the issue is here, I would think that 8GB is enough.

NuclearSurvivor commented 3 years ago

it means the command you are trying to run will take more vram than you have and it wont run. basically if you just said

imagine "whatever you put" then you are not able to run the program

1sacred1 commented 3 years ago

im trying to use the flag "--deeper" do I not have enough vram for that? if not, how much is needed?

NuclearSurvivor commented 3 years ago

probably 16gb, I have a 2080super and I tried to use the --deeper flag, but it gave me the same message.

I was only able to change the number of epochs and number of iterations successfully.

afiaka87 commented 3 years ago

There's a large list of of memory optimization techniques on the README (the front page of the documentation). I'll go get em so people dont have to go hunting if they arrive here from google. But the readme is useful and will stay up to date, so if you're having problems it's the first place to check.

Edit: here you go

https://github.com/lucidrains/deep-daze/issues/80#issuecomment-798844142

From the readme: (You can convert each of those parameters in the python code to arguments in the command line program in the usual way you would think: i.e. --num_layers=24 --batch_size=1 instead of num_layers=24, batch_size=1. You can type imagine --help for a full list of command line arguments. )

High GPU memory usage

If you have at least 16 GiB of vram available, you should be able to run these settings with some wiggle room.

imagine = Imagine(
    text=text,
    num_layers=42,
    batch_size=64,
    gradient_accumulate_every=1,
)

Average GPU memory usage

imagine = Imagine(
    text=text,
    num_layers=24,
    batch_size=16,
    gradient_accumulate_every=2
)

Very low GPU memory usage (less than 4 GiB)

If you are desperate to run this on a card with less than 8 GiB vram, you can lower the image_width.

imagine = Imagine(
    text=text,
    image_width=256,
    num_layers=16,
    batch_size=1,
    gradient_accumulate_every=16 # Increase gradient_accumulate_every to correct for loss in low batch sizes
)

VRAM and speed benchmarks:

These experiments were conducted with a 2060 Super RTX and a 3700X Ryzen 5. We first mention the parameters (bs = batch size), then the memory usage and in some cases the training iterations per second:

For an image resolution of 512:

  • bs 1, num_layers 22: 7.96 GB
  • bs 2, num_layers 20: 7.5 GB
  • bs 16, num_layers 16: 6.5 GB

For an image resolution of 256:

  • bs 8, num_layers 48: 5.3 GB
  • bs 16, num_layers 48: 5.46 GB - 2.0 it/s
  • bs 32, num_layers 48: 5.92 GB - 1.67 it/s
  • bs 8, num_layers 44: 5 GB - 2.39 it/s
  • bs 32, num_layers 44, grad_acc 1: 5.62 GB - 4.83 it/s
  • bs 96, num_layers 44, grad_acc 1: 7.51 GB - 2.77 it/s
  • bs 32, num_layers 66, grad_acc 1: 7.09 GB - 3.7 it/s

@NotNANtoN recommends a batch size of 32 with 44 layers and training 1-8 epochs.

afiaka87 commented 3 years ago

Something I left it out in the README, you actually can pull off the the --deeper flag on an 8 GiB card @NuclearSurvivor (that's how much the 2080 has right? Great card by the way. Shame they didn't ship with more VRAM bc they're plenty fast). You will need to set a decently low --image_width. Probably something like less than ~300px. But it's not all that bad and if you want to you can just run the image through a super resolution net to get it back to a higher resolution. It also runs significantly faster at 256px than at 512px

NuclearSurvivor commented 3 years ago

ok so you would use imagine "whatever" --num_layers=24 --batch_size=1. and ok i will try to render in 256px

just to be clear this is how i would write it imagine "whatever" --image_width=256 --num_layers=24 --batch_size=1

Im running this right now imagine "steve jobs tripping on LSD, sitting watching rick and morty in a cozy living room." --image_width=256 --num_layers=24 --batch_size=1

afiaka87 commented 3 years ago

ok so you would use imagine "whatever" --num_layers=24 --batch_size=1. and ok i will try to render in 256px

just to be clear this is how i would write it imagine "whatever" --image_width=256 --num_layers=24 --batch_size=1

Im running this right now imagine "steve jobs tripping on LSD, sitting watching rick and morty in a cozy living room." --image_width=256 --num_layers=24 --batch_size=1

@NuclearSurvivor

Yep that's exactly the syntax! You can use a cool pip program called gpustat to monitor your VRAM levels as it runs.

pip install gpustat
gpustat -i

If you find you have room to spare, you can up the --num_layers value until you run out of memory. In general, you want to get as close to 32 (the value --deeper uses) as possible. But that's tough on 8 GiB. lemme know how it goes!

afiaka87 commented 3 years ago

also! While I said to use a batch_size of 1, that's very low. I would increase it to 4 or more. One cool trick you can do to get higher quality out of a lower batch size is to use, for instance --batch_size=4 --gradient_accumulate_every=8. The higher your gradient_accumulate_every, the longer each iteration will take, but it will try to make each iteration more accurate. You could think of it as trying to split your batch size into chunks in order to fit it into VRAM (although that approximation isn't how things actually work, it's good enough for thinking at a high level.)

If you really need to use a batch size of 1 to fit things into memory, you should at least set --gradient_accumulate_every to something pretty high like --gradient_accumulate_every=16 to try and get some of that accuracy back from such a low batch size.

@NotNANtoN's bit here about various runtimes batch sizes for image sizes of 256x256 is probably what you need. The bs here is the batch parameter used. You can see how much VRAM it used on the right, as well as how long it took. I believe they used the default value --gradient_accumulate_every=4 for everything except --num_layers >= 32, where you don't really need it to be higher than 1 anymore and it saves you time.

bs 8, num_layers 48: 5.3 GB bs 16, num_layers 48: 5.46 GB - 2.0 it/s bs 32, num_layers 48: 5.92 GB - 1.67 it/s bs 8, num_layers 44: 5 GB - 2.39 it/s bs 32, num_layers 44, grad_acc 1: 5.62 GB - 4.83 it/s bs 96, num_layers 44, grad_acc 1: 7.51 GB - 2.77 it/s bs 32, num_layers 66, grad_acc 1: 7.09 GB - 3.7 it/s @NotNANtoN recommends a batch size of 32 with 44 layers and training 1-8 epochs.

NuclearSurvivor commented 3 years ago

wouldnt do that imagine "extraterrestrial beings" --batch_size=32 --num_layers=44 --epochs=8 didnt work for me i got the runtime cuda out of memory error imagine "extraterrestrial beings" --gradient_accumulate_every=16 --batch_size=8 --num_layers 15 worked for me though do you think this will give me a nice clean image?

afiaka87 commented 3 years ago

He didn't explicityl mention this, but he's also using --image_width=256 and --gradient_accumulate_every=1.

The full command would technically be:

imagine "extraterrestrial beings" --batch_size=32 --num_layers=44 --epochs=8 --image_width=256 --gradient_accumulate_every=1

Does that work?

afiaka87 commented 3 years ago

So, if you run the pip install gpustat; gpustat -i stuff from earlier, what you'll find is that on windows or linux, your GPU is actually already using about 2 GiB of VRAM. This is normal and it's the result of your desktop manager and general operating system environment needing to store things in VRAM for snappy responses in the GUI.

Often, we're working in linux environments on a server that are "headless". They dont have a desktop environment. You just run command line. As such they have zero GPU usage and that lets you use more of your total VRAM. If you are on linux and would like a desktop environment that uses less VRAM, you can check out XFCE, i3, etc. They will use something like 500 - 300 MiB. You can also just hit alt-f2 at login, kill gnome-desktop, or whatever you're using and just run everything in that terminal session.

afiaka87 commented 3 years ago

wouldnt do that imagine "extraterrestrial beings" --batch_size=32 --num_layers=44 --epochs=8 didnt work for me i got the runtime cuda out of memory error imagine "extraterrestrial beings" --gradient_accumulate_every=16 --batch_size=8 --num_layers 15 worked for me though do you think this will give me a nice clean image?

You shouldn't go beneath 16 layers. It's the default and should probably be the minimum, honestly. It just barely starts to give you good results. Going to, say 24 with a slightly slower image_width than you're currently using would be worth the decrease in resolution.

This is sort of part of the process with machine learning. VRAM is a precious resource in ML and this stuff uses a lot of it. As such it requires tinkering with a few parameters. These parameters are always a tradeoff between memory usage, total runtime, and the accuracy of the neural net. In exchange for being patient enough to find decent parameters, you get a state-of-the-art open source image generation for free. Pretty cool stuff but I'll admit it takes some patience.

NuclearSurvivor commented 3 years ago

You shouldn't go beneath 16 layers. It's the default and should probably be the minimum, honestly. It just barely starts to give you good results. Going to, say 24 with a slightly slower image_width than you're currently using would be worth the decrease in resolution.

This is sort of part of the process with machine learning. VRAM is a precious resource in ML and this stuff uses a lot of it. As such it requires tinkering with a few parameters. These parameters are always a tradeoff between memory usage, total runtime, and the accuracy of the neural net. In exchange for being patient enough to find decent parameters, you get a state-of-the-art open source image generation for free. Pretty cool stuff but I'll admit it takes some patience.

I was willing to have the patience to install this, and it took all last night and when I put an issue in the channel, the dev immediately fixed both of the issues I had and other people were having in less than an hour so that was pretty cool. Unfortunately, I am not on Linux and am subjected to the 2 GB of VRAM usage by default. L

The code you gave me imagine "extraterrestrial beings" --batch_size=32 --num_layers=44 --epochs=8 --image_width=256 --gradient_accumulate_every=1 worked for me and its generating the first image right now Could i make the gradient accumulation like 4 or 8? or does that have a heavy toll on the vram usage? Also using that command how could i use a reference image for the machine to use?

Also i think it might be using my CPU becuase its at 100% and my gpu is at like 4-10%

afiaka87 commented 3 years ago

Gradient accumulate is kinda magical. It has zero impact on memory, but it will take a bit longer if you increase it.

soberirving commented 3 years ago

@NuclearSurvivor @afiaka87 My cuda GPU utilization is 0%...Although the reply of torch.cuda.is_available() is TRUE...Could you please share your conda list? I want to know which part do I miss...Thank you very much!