lucidrains / deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
MIT License
4.37k stars 327 forks source link

Memory error when generating image #81

Closed raclettes closed 3 years ago

raclettes commented 3 years ago

I encounter this error upon running:

Traceback (most recent call last):
  File "c:\users\miner\appdata\local\programs\python\python38\lib\runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\miner\appdata\local\programs\python\python38\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\Miner\AppData\Local\Programs\Python\Python38\Scripts\imagine.exe\__main__.py", line 7, in <module>
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\cli.py", line 111, in main
    fire.Fire(train)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\fire\core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\fire\core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\fire\core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\cli.py", line 107, in train
    imagine()
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\deep_daze.py", line 447, in forward
    _, loss = self.train_step(epoch, i)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\deep_daze.py", line 380, in train_step
    out, loss = self.model(self.clip_encoding)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\deep_daze\deep_daze.py", line 168, in forward
    out = self.model()
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 97, in forward
    out = self.net(coords)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 76, in forward
    x = self.net(x)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\container.py", line 119, in forward
    input = module(input)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 48, in forward
    out = self.activation(out)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "c:\users\miner\appdata\local\programs\python\python38\lib\site-packages\siren_pytorch\siren_pytorch.py", line 19, in forward
    return torch.sin(self.w0 * x)
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 3.85 GiB already allocated; 79.44 MiB free; 3.87 GiB reserved in total by PyTorch)

I attempted clearing cuda cache, but the same error occured.

>>> import torch
>>> torch.cuda.empty_cache()
raclettes commented 3 years ago

By default, torch has no memory allocated.

>>> print(torch.cuda.memory_summary(device=None, abbreviated=False))
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Active memory         |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| GPU reserved memory   |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |       0 B  |       0 B  |       0 B  |       0 B  |
|       from large pool |       0 B  |       0 B  |       0 B  |       0 B  |
|       from small pool |       0 B  |       0 B  |       0 B  |       0 B  |
|---------------------------------------------------------------------------|
| Allocations           |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|===========================================================================|
raclettes commented 3 years ago

Running it just after https://github.com/lucidrains/deep-daze/blob/main/deep_daze/deep_daze.py#L168 produces the following output

|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |  360168 KB |    1374 MB |   12678 MB |   12327 MB |
|       from large pool |  347904 KB |    1362 MB |   12629 MB |   12290 MB |
|       from small pool |   12264 KB |      13 MB |      49 MB |      37 MB |
|---------------------------------------------------------------------------|
| Active memory         |  360168 KB |    1374 MB |   12678 MB |   12327 MB |
|       from large pool |  347904 KB |    1362 MB |   12629 MB |   12290 MB |
|       from small pool |   12264 KB |      13 MB |      49 MB |      37 MB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |    1396 MB |    1396 MB |    1396 MB |       0 B  |
|       from large pool |    1382 MB |    1382 MB |    1382 MB |       0 B  |
|       from small pool |      14 MB |      14 MB |      14 MB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |   20760 KB |   25791 KB |  275962 KB |  255202 KB |
|       from large pool |   18688 KB |   23808 KB |  224128 KB |  205440 KB |
|       from small pool |    2072 KB |    2139 KB |   51834 KB |   49762 KB |
|---------------------------------------------------------------------------|
| Allocations           |     351    |     359    |     725    |     374    |
|       from large pool |      88    |      92    |     137    |      49    |
|       from small pool |     263    |     272    |     588    |     325    |
|---------------------------------------------------------------------------|
| Active allocs         |     351    |     359    |     725    |     374    |
|       from large pool |      88    |      92    |     137    |      49    |
|       from small pool |     263    |     272    |     588    |     325    |
|---------------------------------------------------------------------------|
| GPU reserved segments |      25    |      25    |      25    |       0    |
|       from large pool |      18    |      18    |      18    |       0    |
|       from small pool |       7    |       7    |       7    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |      11    |      12    |     171    |     160    |
|       from large pool |       6    |       6    |      15    |       9    |
|       from small pool |       5    |       7    |     156    |     151    |
|===========================================================================|
raclettes commented 3 years ago

For reference, I have a GeForce RTX 2060

afiaka87 commented 3 years ago

There's a similar issue happening in: https://github.com/lucidrains/deep-daze/issues/80#issuecomment-798844142

But yeah, you don't have enough VRAM. Most consumer GPUs dont - so don't feel bad. Less than 8 GiB of VRAM makes it pretty tough to do. But you might be able to if you set image_width to 256 or lower. There's a lot of people with this issue today so please check the link for information on how to solve it. I've typed too much for now ha.

Edit: as usual (unfortunately) the best (free) way to run this program is with the Google Colab notebooks. If you're not opposed to that you can use it for free (seriously) and you're basically guaranteed a GPU with 16 GB of VRAM. You can find them on the front page of this project ("README.md")

afiaka87 commented 3 years ago

@discordstars

raclettes commented 3 years ago

@afiaka87 Oh alright, thanks for the quick response. I'll give it a go with a smaller image width; I already tried smaller batch size

afiaka87 commented 3 years ago

For sure no problem. The most important bit on that page is @NotNANtoN's benchmarks for the 256 image_width while varying batch size. GPU usage on the right. bs is the batch_size. grad_acc stands for --gradient_accumulate_every=1. It defaults to 4, but you don't need it as much with higher batch sizes.

bs 8, num_layers 48: 5.3 GB
bs 16, num_layers 48: 5.46 GB - 2.0 it/s
bs 32, num_layers 48: 5.92 GB - 1.67 it/s
bs 8, num_layers 44: 5 GB - 2.39 it/s
bs 32, num_layers 44, grad_acc 1: 5.62 GB - 4.83 it/s
bs 96, num_layers 44, grad_acc 1: 7.51 GB - 2.77 it/s
bs 32, num_layers 66, grad_acc 1: 7.09 GB - 3.7 it/s

Keep in mind, your OS (windows, linux?) is going to be using some GPU VRAM as well. Anywhere from 500 MB to 2 GB in my experience.

afiaka87 commented 3 years ago

@discordstars Thanks for filing an issue btw! We always appreciate it even if we're too busy to get around to helping everyone.

If you're new to github, make sure you mash that "Close Issue" button if you feel your question's been answered. Do let me know if you manage to get it working on there. It's useful for future users to know if it's even possible.

raclettes commented 3 years ago

Not new, but thanks for the reminder.

I'll give it a go with smaller image sizes and batch sizes and update the issue before I close it :)

Edit: and oops, I must have entirely skimmed over the links in the README. I'll do that after too (for the sake of actually getting decent output)

afiaka87 commented 3 years ago

Not new, but thanks for the reminder.

My bad. I try to make as few assumptions about people on here. Hope it didnt come across as patronizing.

raclettes commented 3 years ago

@afiaka87 Absolutely not, no worries 😆 just making a remark.

I was able to run with --image-width 256 with the 6GiB of VRAM. I haven't tried other resolutions but this is working. ~2.84 it/s.