lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 327 forks source link

Question about using image samples and VRAM #80

Open NuclearSurvivor opened 3 years ago

NuclearSurvivor commented 3 years ago

I've been messing around with the program all day now and it is very interesting. I have thousands of wallpapers I use I keep in a folder and I thought it would be cool to use one of them as an image sample. I was getting the CUDA out of memory runtime error when I tried to use the command

imagine "a psychedelic experience." --start-image-path ./fasfa.jpg

I'm sure I'm getting this error because of the 1080p resolution of the reference image and my 8GB of VRAM not being able to render/train the image or whatever it's called. Is there a way to make the AI interpret the image at a lower resolution may be an alternative way to slow the training/rendering down to cope with the resolution of the image? I understand this is like stuff a 9-year-old could figure out but I just want to be 100% sure of the capabilities of this program. If there is no such option is that a feature that could be implemented into the software, To either interpret the resolution of any image as a singular image or able to slow the render speed or change the rate of the training (or change certain aspects of whatever the VRAM is used to do) to make it able to cope with the high res image? Or do I just have to crop it or manually lower the resolution? I tried to tag this issue as a question but I couldn't find the option to do so.

afiaka87 commented 3 years ago

From the readme:

High GPU memory usage

If you have at least 16 GiB of vram available, you should be able to run these settings with some wiggle room.

imagine = Imagine(
    text=text,
    num_layers=42,
    batch_size=64,
    gradient_accumulate_every=1,
)

Average GPU memory usage

imagine = Imagine(
    text=text,
    num_layers=24,
    batch_size=16,
    gradient_accumulate_every=2
)

Very low GPU memory usage (less than 4 GiB)

If you are desperate to run this on a card with less than 8 GiB vram, you can lower the image_width.

imagine = Imagine(
    text=text,
    image_width=256,
    num_layers=16,
    batch_size=1,
    gradient_accumulate_every=16 # Increase gradient_accumulate_every to correct for loss in low batch sizes
)

VRAM and speed benchmarks:

These experiments were conducted with a 2060 Super RTX and a 3700X Ryzen 5. We first mention the parameters (bs = batch size), then the memory usage and in some cases the training iterations per second:

For an image resolution of 512:

  • bs 1, num_layers 22: 7.96 GB
  • bs 2, num_layers 20: 7.5 GB
  • bs 16, num_layers 16: 6.5 GB

For an image resolution of 256:

  • bs 8, num_layers 48: 5.3 GB
  • bs 16, num_layers 48: 5.46 GB - 2.0 it/s
  • bs 32, num_layers 48: 5.92 GB - 1.67 it/s
  • bs 8, num_layers 44: 5 GB - 2.39 it/s
  • bs 32, num_layers 44, grad_acc 1: 5.62 GB - 4.83 it/s
  • bs 96, num_layers 44, grad_acc 1: 7.51 GB - 2.77 it/s
  • bs 32, num_layers 66, grad_acc 1: 7.09 GB - 3.7 it/s

@NotNANtoN recommends a batch size of 32 with 44 layers and training 1-8 epochs.

You can convert each of those parameters in the python code to arguments in the command line program in the usual way you would think: i.e. --num_layers instead of num_layers. You can type imagine --help for a full list of command line arguments.

NuclearSurvivor commented 3 years ago

I'm very new to python and all this stuff. How would I set the code to best suit my RTX 2080 Super. Would i go into a new command line and type python following imagine = Imagine( text=text, num_layers=24, batch_size=16, gradient_accumulate_every=2 ) Or is it more something like imagine "whatever i want here" --num_layers=22

afiaka87 commented 3 years ago

Yeah it's no worries! Welcome to Python and the machine learning community!

Assuming you are a.) On linux, b.) Using an Nvidia GPU, c.) Have cuda installed properly:

Here's how I do it.

This part's kinda annoying, but you should try to use virtual environments with python. using your global python can cause bugs that are really tough to figure out.

python3 -m pip install virtualenv
mkdir -p ~/Projects/run_deep_daze
cd ~/Projects/run_deep_daze
python3 -m virtualenv .venv
source .venv/bin/activate
echo "You should be in a clean python virtual environment now. Packages installed here won't pollute your global python. Everytime you want to work on this project again, you will need to run 'source .venv/bin/activate' again. "

Double check that you're in a virtualenv:

which python
echo "Your python path should have '.venv' in it by now. If its from /usr/local/bin, /usr/bin, or /bin, you need to run 'source .venv/bin/activate' again inside your project directory."

Important: Check your CUDA version with nvidia-smi. Change the numbers in e.g. +cu111 to your cuda version. At the time of this post +cu112 versions aren't availabe, so just use +cu111 if you have cuda 11.2.

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

finally, install deep daze.

pip install deep-daze

Okay, now you're ready to write and run some Python! Here's a starting file.

To run it, just save it to "run.py" and then run: python run.py

from tqdm import trange

from deep_daze import Imagine

TEXT = 'a female mannequin dressed in a black button - down shirt and white palazzo pants' #@param {type:"string"}
NUM_LAYERS = 44 #@param {type:"number"}
SAVE_EVERY =  20 #@param {type:"number"}
IMAGE_WIDTH = 512 #@param {type:"number"}
SAVE_PROGRESS = True #@param {type:"boolean"}
LEARNING_RATE = 9e-6 #@param {type:"number"}
ITERATIONS = 1050 #@param {type:"number"}
EPOCHS = 8
BATCH_SIZE = 32
GRADIENT_ACCUMULATE_EVERY = 4
model = Imagine(
    text = TEXT,
    num_layers = NUM_LAYERS,
    save_every = SAVE_EVERY,
    image_width = IMAGE_WIDTH,
    lr = LEARNING_RATE,
    iterations = ITERATIONS,
    epochs = EPOCHS,
    save_progress = SAVE_PROGRESS,
    batch_size = BATCH_SIZE,
    gradient_accumulate_every = GRADIENT_ACCUMULATE_EVERY,
    open_folder = False # Set this to true if you want to open the folder you're in to view the files 
)

for epoch in trange(EPOCHS, desc = 'epochs'):
    for i in trange(ITERATIONS, desc = 'iteration'):
        model.train_step(epoch, i)
afiaka87 commented 3 years ago

In general though: there's currently a GPU shortage meaning that cloud providers kind of control the machine learning market. This stuff needs lots of VRAM and unfortunately, colab does make it very easy to get a card with 16 GiB of VRAM. So as always, if you're not too against Google in general, and don't have a hard requirement to run this locally, then it's advised that you go to the front page of the README and mash that "Newer, simpler colab notebook" button. This will at the very least show you how the Python code actually looks.

NuclearSurvivor commented 3 years ago

Yeah it's no worries! Welcome to Python and the machine learning community!

Assuming you are a.) On linux, b.) Using an Nvidia GPU, c.) Have cuda installed properly:

Here's how I do it.

This part's kinda annoying, but you should try to use virtual environments with python. using your global python can cause bugs that are really tough to figure out.

python3 -m pip install virtualenv
mkdir -p ~/Projects/run_deep_daze
cd ~/Projects/run_deep_daze
python3 -m virtualenv .venv
source .venv/bin/activate
echo "You should be in a clean python virtual environment now. Packages installed here won't pollute your global python. Everytime you want to work on this project again, you will need to run 'source .venv/bin/activate' again. "

Double check that you're in a virtualenv:

which python
echo "Your python path should have '.venv' in it by now. If its from /usr/local/bin, /usr/bin, or /bin, you need to run 'source .venv/bin/activate' again inside your project directory."

Important: Check your CUDA version with nvidia-smi. Change the numbers in e.g. +cu111 to your cuda version. At the time of this post +cu112 versions aren't availabe, so just use +cu111 if you have cuda 11.2.

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

finally, install deep daze.

pip install deep-daze

Okay, now you're ready to write and run some Python! Here's a starting file.

To run it, just save it to "run.py" and then run: python run.py

from tqdm import trange

from deep_daze import Imagine

TEXT = 'a female mannequin dressed in a black button - down shirt and white palazzo pants' #@param {type:"string"}
NUM_LAYERS = 44 #@param {type:"number"}
SAVE_EVERY =  20 #@param {type:"number"}
IMAGE_WIDTH = 512 #@param {type:"number"}
SAVE_PROGRESS = True #@param {type:"boolean"}
LEARNING_RATE = 9e-6 #@param {type:"number"}
ITERATIONS = 1050 #@param {type:"number"}
EPOCHS = 8
BATCH_SIZE = 32
GRADIENT_ACCUMULATE_EVERY = 4
model = Imagine(
    text = TEXT,
    num_layers = NUM_LAYERS,
    save_every = SAVE_EVERY,
    image_width = IMAGE_WIDTH,
    lr = LEARNING_RATE,
    iterations = ITERATIONS,
    epochs = EPOCHS,
    save_progress = SAVE_PROGRESS,
    batch_size = BATCH_SIZE,
    gradient_accumulate_every = GRADIENT_ACCUMULATE_EVERY,
    open_folder = False # Set this to true if you want to open the folder you're in to view the files 
)

for epoch in trange(EPOCHS, desc = 'epochs'):
    for i in trange(ITERATIONS, desc = 'iteration'):
        model.train_step(epoch, i)

Nope, on windows 10 lol. I have been using virtual environments. When I was trying to install Deep daze I went through many installation pages and found how to set up the virtual environment on windows! I would use Linux but I'm a gamer, all games support win10 but very few support Linux, or I would use it. This just piqued my interest and I've been wanting to get into machine learning and coding so this was the perfect opportunity! The primary reason I installed Deep Daze on my PC was that I 1). Have a 2080 Super and playing RTX Minecraft and no man's sky wasn't a great use for the GPU, and 2). wanted to feel a sense of accomplishment instead of just having the code there set up for me and all I needed to do was just type imagine "nut sack" and have it do its thing. Thank you for all the information though you have answered many of my questions!

afiaka87 commented 3 years ago

I would use Linux but I'm a gamer, all games support win10 but very few support Linux, or I would use it.

Oh wow, I can relate to this. I still dual boot in order to play my steam library. WSL 2.0 is nice though. They're rolling out GPU passthrough soon as well which would allow you to run a full linux environment with CUDA support on your GPU. Pretty cool stuff. Til then I kinda need both though. Linux is fantastic for development

NuclearSurvivor commented 3 years ago

I would use Linux but I'm a gamer, all games support win10 but very few support Linux, or I would use it.

Oh wow, I can relate to this. I still dual boot in order to play my steam library. WSL 2.0 is nice though. They're rolling out GPU passthrough soon as well which would allow you to run a full linux environment with CUDA support on your GPU. Pretty cool stuff. Til then I kinda need both though. Linux is fantastic for development

do you have discord? or any contact info? I would love to learn how to dual boot.

afiaka87 commented 3 years ago

I don't give that out on here unfortunately. Glad I could help. If you were to join the EleutherAI channel, you might find me there. That's about all I'm willing to say, ha.

zchris07 commented 3 years ago

Yeah it's no worries! Welcome to Python and the machine learning community!

Assuming you are a.) On linux, b.) Using an Nvidia GPU, c.) Have cuda installed properly:

Here's how I do it.

This part's kinda annoying, but you should try to use virtual environments with python. using your global python can cause bugs that are really tough to figure out.

python3 -m pip install virtualenv
mkdir -p ~/Projects/run_deep_daze
cd ~/Projects/run_deep_daze
python3 -m virtualenv .venv
source .venv/bin/activate
echo "You should be in a clean python virtual environment now. Packages installed here won't pollute your global python. Everytime you want to work on this project again, you will need to run 'source .venv/bin/activate' again. "

Double check that you're in a virtualenv:

which python
echo "Your python path should have '.venv' in it by now. If its from /usr/local/bin, /usr/bin, or /bin, you need to run 'source .venv/bin/activate' again inside your project directory."

Important: Check your CUDA version with nvidia-smi. Change the numbers in e.g. +cu111 to your cuda version. At the time of this post +cu112 versions aren't availabe, so just use +cu111 if you have cuda 11.2.

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

finally, install deep daze.

pip install deep-daze

Okay, now you're ready to write and run some Python! Here's a starting file.

To run it, just save it to "run.py" and then run: python run.py

from tqdm import trange

from deep_daze import Imagine

TEXT = 'a female mannequin dressed in a black button - down shirt and white palazzo pants' #@param {type:"string"}
NUM_LAYERS = 44 #@param {type:"number"}
SAVE_EVERY =  20 #@param {type:"number"}
IMAGE_WIDTH = 512 #@param {type:"number"}
SAVE_PROGRESS = True #@param {type:"boolean"}
LEARNING_RATE = 9e-6 #@param {type:"number"}
ITERATIONS = 1050 #@param {type:"number"}
EPOCHS = 8
BATCH_SIZE = 32
GRADIENT_ACCUMULATE_EVERY = 4
model = Imagine(
    text = TEXT,
    num_layers = NUM_LAYERS,
    save_every = SAVE_EVERY,
    image_width = IMAGE_WIDTH,
    lr = LEARNING_RATE,
    iterations = ITERATIONS,
    epochs = EPOCHS,
    save_progress = SAVE_PROGRESS,
    batch_size = BATCH_SIZE,
    gradient_accumulate_every = GRADIENT_ACCUMULATE_EVERY,
    open_folder = False # Set this to true if you want to open the folder you're in to view the files 
)

for epoch in trange(EPOCHS, desc = 'epochs'):
    for i in trange(ITERATIONS, desc = 'iteration'):
        model.train_step(epoch, i)

How could I edit this script so that it doesn't create a new image each time it updates it and only outputs the start and final image?