CUDA out of memory - Githubissues

matrix4767 commented 9 months ago

Looks like this extension can't be ran on anything but high memory video cards...

G:\stable-webui\extensions\sd-webui-rich-text\scripts\models\region_diffusion.py:203: FutureWarning: Accessing config attribute in_channels directly via 'UNet2DConditionModel' object attribute is deprecated. Please access 'in_channels' over 'UNet2DConditionModel's config object instead, e.g. 'unet.config.in_channels'. (text_embeddings.shape[0] // 2, self.unet.in_channels, height // 8, width // 8), device=self.device) Traceback (most recent call last): File "G:\stable-webui\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "G:\stable-webui\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api result = await self.call_function( File "G:\stable-webui\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function prediction = await anyio.to_thread.run_sync( File "G:\stable-webui\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "G:\stable-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "G:\stable-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "G:\stable-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper response = f(args, kwargs) File "G:\stable-webui\extensions\sd-webui-rich-text\scripts\rich_text_on_tab.py", line 146, in generate plain_img = model.produce_attn_maps([base_text_prompt], [negative_text], File "G:\stable-webui\extensions\sd-webui-rich-text\scripts\models\region_diffusion.py", line 215, in produce_attn_maps noise_pred = self.unet( File "G:\stable-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "G:\stable-webui\extensions\sd-webui-rich-text\scripts\models\unet_2d_condition.py", line 959, in forward sample = upsample_block( File "G:\stable-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "G:\stable-webui\extensions\sd-webui-rich-text\scripts\models\unet_2d_blocks.py", line 2143, in forward hidden_states = attn( File "G:\stable-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "G:\stable-webui\extensions\sd-webui-rich-text\scripts\models\transformer_2d.py", line 291, in forward hidden_states = block( File "G:\stable-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "G:\stable-webui\extensions\sd-webui-rich-text\scripts\models\attention.py", line 155, in forward attnoutput, = self.attn1( File "G:\stable-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1538, in _call_impl result = forward_call(args, kwargs) File "G:\stable-webui\extensions\sd-webui-rich-text\scripts\models\attention_processor.py", line 330, in forward return self.processor( File "G:\stable-webui\extensions\sd-webui-rich-text\scripts\models\attention_processor.py", line 1159, in call attention_probs = attn.get_attention_scores(query, key, attention_mask, attn_weights=attn_weights) File "G:\stable-webui\extensions\sd-webui-rich-text\scripts\models\attention_processor.py", line 401, in get_attention_scores attention_probs = attention_scores.softmax(dim=-1) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 12.00 GiB total capacity; 8.67 GiB already allocated; 0 bytes free; 9.76 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Acephalia commented 9 months ago

Got this on 24gb VRAM as well. My guess is that it is loading the model on top of what A1111 has already loaded.

@songweige anything that we might be missing?

matrix4767 commented 9 months ago

Another thing is this takes more vram just by having the extension enabled, even if it's not in use.

Acephalia commented 9 months ago

Another thing is this takes more vram just by having the extension enabled, even if it's not in use.

I’m guessing it just loads the model into VRAM on load up and doesn’t quite adhere to the loaded models in settings. Needs some tinkering there no doubt. Hopefully the kinks will be ironed out. I gave the demo a test on their space and it’s absolutely fantastic. Can’t wait to get it rolling.

songweige commented 9 months ago

Thanks for the great comments! I have modified the code to make sure that the model is not loaded until the first generation process starts. Also, it is amazing to me how A1111 optimizes memory and time cost for sampling, where this extension still needs to improve quite a bit. But sadly I cannot guarantee that I will have the time to make it happen. But I will do my best.

Acephalia commented 9 months ago

Thanks for the great comments! I have modified the code to make sure that the model is not loaded until the first generation process starts. Also, it is amazing to me how A1111 optimizes memory and time cost for sampling, where this extension still needs to improve quite a bit. But sadly I cannot guarantee that I will have the time to make it happen. But I will do my best.

@songweige Thank you for the quick fix. Is it possible to add an Unload button to unload the model as well? That would certainly help as opposed to restarting the webui each time.

I really do hope someone will pickup/assist you with the development on this as the potential is just incredible.

~~I still haven’t been able to get it running in A1111 going to give it another go shortly.~~

Update:

@songweige No luck. Even with all A1111 models unloadeded running rich text caps out 24GB

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 24.00 GiB total capacity; 22.46 GiB already allocated; 0 bytes free; 23.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

songweige commented 9 months ago

@Acephalia Sorry for the late response, as I was traveling in the last week. May I ask if you encounter the memory issue with the SD1.5 or SDXL?

If it is SDXL, then I think I might need to incorporate some changes, but it should be fixed as the huggingface demo runs fine on an A10G instance with 24GB vram.

If it is SD1.5, then I guess there might be some bug in the code. Could you try a simple prompt without any rich-text attributes to see if the OOM still exists?

songweige commented 9 months ago

I also hope to get some hands-on help from someone as the next paper deadline is approaching soon. : D I did get very useful references from the latest issue #15!

Acephalia commented 9 months ago

@Acephalia Sorry for the late response, as I was traveling in the last week. May I ask if you encounter the memory issue with the SD1.5 or SDXL?

If it is SDXL, then I think I might need to incorporate some changes, but it should be fixed as the huggingface demo runs fine on an A10G instance with 24GB vram.

If it is SD1.5, then I guess there might be some bug in the code. Could you try a simple prompt without any rich-text attributes to see if the OOM still exists?

Hardly a delay at all!

It’s with SDXL. I even tried the standalone repo from the original code. It generated a few images with the sample startup but couldn’t get to the GUI errored with OOM.

Will report back soon with the results of trying without the rich text.

Glad you are getting some help. I’m no where close to what you guys do coding wise but if I can lend a grand with anything don’t hesitate to reach out!

songweige commented 9 months ago

Thanks for the offering! Hope that I can get it truely work with A1111 soon!

Ah, it sounds like an OOM occurred when the font color was specified. Let me know what you get with no rich text, and I think I should be able to fix the issue!

Acephalia commented 9 months ago

Just gave it a go and YES its the colour thats causing the OOM. It is however still using almost all 24gb of RAM and there is no way to unload it.

I'm sure you will. I think ultimately if we can get it integrated with the standard prompt box and have it work with the checkpoint chooser that would really change the whole ball game.

songweige commented 9 months ago

I have made some improvements to the memory usage to make it the same as the huggingface demo. Now it should fit into 24 GB vram. Also, I plan to add the unload button. May I ask what you did to unload the A1111 models? I may want to learn to do the same!

songweige / sd-webui-rich-text

CUDA out of memory #13