CUDA out of memory - Githubissues

IT-MBiT commented 1 year ago

Hi! We have problems... When we make a text to image conversion in linux shell we can see this error:

File "/root/kubin/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 7.79 GiB total capacity; 6.09 GiB already allocated; 3.12 MiB free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

We havve two NVIDIA GPUs with 8 GB RAM for only device.

How we can fix this problem? Thanks in advance.

seruva19 commented 1 year ago

Hi, you can try apply optimizations described here: https://github.com/seruva19/kubin/wiki/Docs#system-requirements For instance, turning on 'Enable prior generation on CPU' option should be enough to make weights fit into 8 Gb (tested on GTX 1070) and generation of images should still be fast enough (if you have good CPU).

IT-MBiT commented 1 year ago

Hi tanks for the reply, using this settings the error is gone. But we have 2 GPUs with 8GB, is not possible use all the memory of all the GPUs? Anyway, after the first image generation is impossible make a second image generation, if we don't restart the application we have this error:

File "/root/kubin/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) NotImplementedError: Cannot copy out of meta tensor; no data!

Where we are wrong? Thanks for previous reply and tanks in advance if you can help us with the second error.

seruva19 commented 1 year ago

I wish we had such opportunity (similar to ExLlama for LLMs), but currently I am not aware of any methods that enable parallel processing of single batch job on multiple GPUs. In theory though, it's possible to run several separate inference pipelines, each utilizing one GPU, however right now this app does not support such advanced techniques :)

Regarding the error you mentioned: are you sure that you are using the most recent version of the app? Because there was an update lately that fixed the exact error you described: see https://github.com/seruva19/kubin/issues/124

IT-MBiT commented 1 year ago

Hi, we have updated the application by the script "update.sh" so we think that this is the last version, but after the update we have the same error when we try to generate a second image in text2img.

IT-MBiT commented 1 year ago

And sometimes after the update we have this error:

File "/root/kubin/venv/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

seruva19 commented 1 year ago

Can you check the output of git log --pretty=format:'%h' -n 1?

IT-MBiT commented 1 year ago

Hi! Yes off course, the result is: e3b0318

seruva19 commented 1 year ago

That's what I thought, your version of app is slightly outdated (July 21) and does not include latest fixes. Idk why update script did not work in your case, but try to update manually with git pull and check if the error persists.

IT-MBiT commented 1 year ago

Hi, thank you very much for the help now everything seems to work fine. Given your experience in this regard, we would like to ask you if as far as you know it is possible to perform a fine tuning of kandinsky making it recognize a subject with a name. Example: We give him some image of the same subject (for example "Marc") and a name of this subject so when we ask to kandisky one image with "Marc on car" he make an image with our Marc as subject.

seruva19 commented 1 year ago

Unfortunately, I haven't attempted any fine-tuning myself yet due to a lack of time and limited local GPU resources, so cannot share any insights so far. But starting from version 2.2, Kandinsky supports LoRA training for both subjects and styles and I've also recently integrated LoRA training and inference tools into the GUI, so I'm planning to give LoRA training a try in the near future, likely in September. If I achieve any success, I'll write about the results.

IT-MBiT commented 1 year ago

Hello, thank you very much for the information, we look forward to the results of your tests and thank you for wanting to share them with us.

seruva19 / kubin

CUDA out of memory #136