lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
8.35k stars 812 forks source link

Is it possible to run NF4 checkpoints with LoRA on 12gb VRAM? #1374

Open Atoli opened 2 months ago

Atoli commented 2 months ago

OS: W10 LTSC RAM: 16gb GPU: Nvidia 3060 12gb VRAM

I keep running into OOM errors when i try to use a single LoRA:

oom

Is there any setting i am applying wrongly? I have seen people run NF4 checkpoints on 6-8gb VRAM so i find weird a single LoRA would take 4-6gb VRAM to run.

I would post a log but i cannot find the log file.

dramatticdev commented 2 months ago

I have a 3060 12gb vram and I too had issues, especially when loading flux loras. But I added 30gb from my free space on ssd to virtual ram. Now everything runs rather smoothly like 2 - 4 s/it I had to have Never OOM Integrated Enabled for UNet (always maximize offload) selected or I would get crashes trying to load loras for flux. but once they were patched I could turn off Never OOM but I left it on for some occasions. and the swap location setting at the top of the ui is set to shared. My pc is i7 7th gen, rtx306012gb 16gb ram and 30gb virtual ram

So yes it is possible and i had those same issues you had. try this out. I am sorry for bad explanation if it isnt understood. English second language.

dramatticdev commented 2 months ago

I just just checked now we got the same physical ram too. So I definately reccommend the virtual ram as much as you can provide and you should be generating flux as fast as sdxl

xJexix commented 2 months ago

I have a 4060 with 8GB of RAM. Using virtual RAM can help, and I added an internal SSD to act as a cache, which I’ve also used as VRAM. I adjusted the shared memory to 6156MB and maxed out the VRAM to around 591,893MB, since my SSD has about 2TB of space. Generating takes about 1-2 minutes, and I tested various models and settings, with times ranging from 1:55 to 2:55. So far, it works well compared to ComfyUI, which can take 5-10 minutes.

lllyasviel commented 2 months ago

image

Atoli commented 2 months ago

image

I see.

The option was already turned on in my system, but i for some reason i have more currently allocated.

mn

Either way, checking this option didn't help.

xJexix commented 2 months ago

To set a custom paging file size, start by turning off the automatic management. For example, my drive is (S:), and I have about "847212 MB" of free space (which is roughly 847 GB), I can adjust the settings. Uncheck "Automatically manage paging file size for all drives" to turn off the automatic setting. Then, I can either set my drive to "System managed" or go with "Custom size."

The recommended size will show you the maximum you can set. I’ve set mine to 19456 MB (about 19 GB) for that drive. You could increase it to around 40-50 GB (which is about "56879 MB") if you have enough space. Just make sure there’s enough room on the drive. If you have a separate cache drive, you can use that too. I have 5 drives in total—2 are for caching and the rest for storage. I even added one more for VRAM, which has really helped with generating images.

Once you’ve made these changes, don’t forget to restart your PC.

Screenshot 2024-08-21 120943

So having an external or internal SSD, NVMe drive, or even an HDD can be useful as a cache drive or for allocating memory. Often, issues with memory usage can also be related to RAM. I have 48 GB of RAM, and I’ve found that even with this amount, UI tools like Forge Webui and models such as Flux or SDXL can still use a lot of memory.

Upgrading from 16 GB to 32 GB of RAM, It made a noticeable difference for me in the past when I was using my old laptop with an RTX 2070 and 16 GB of RAM.

Now that I use my desktop PC is much faster, so you might want to consider upgrading if you’re currently at 16 GB.

tazztone commented 2 months ago

set this to skip lora patching image

ZeroCool22 commented 2 months ago

set this to skip lora patching image

But what disadvantages or collateral damage have that?

tazztone commented 2 months ago

idk. more few more MB of VRAM usage? it works and pics come out as they should with LoRA