kuprel / min-dalle

min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch
MIT License
3.48k stars 257 forks source link

CUDA out of memory: seems to always allocate all of it. #69

Closed ImagicTheCat closed 2 years ago

ImagicTheCat commented 2 years ago

Hi,

I can't get the model working using the replicate image (r8.im/kuprel/min-dalle@sha256:71b9ef81385fae73b632d7e2fe0f5988a739781e833a610a9c83bc45205d8215) on any GPU because of OOM errors. I tried on progressively increased GPU VRAM, from 16 GB to 48 GB. Does it really require more or something is wrong ?

On a RTX A6000 with 48GB Video Memory (cloud docker containers), when requesting a prediction: 2022-07-13T11:57:56.509707773Z RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 47.54 GiB total capacity; 45.68 GiB already allocated; 3.56 MiB free; 46.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

kuprel commented 2 years ago

It can generate 9x9 grids on an A100 with 40GB GPU RAM so it should work on 48GB. I haven't tried using the docker image from replicate though

ImagicTheCat commented 2 years ago

Tested a provided pytorch container with Jupyter Notebook without issues, and more space efficient (less bloated it seems). I will probably try to create my own custom container based on that.

I don't know what the replicate image is doing wrong.

Note: The GPU was a Tesla V100 FHHL 16 GB VRAM and the memory usage was around 14-16 GB for a 3x3 grid, ~23s per generation (4x4 failed).