Open Jonseed opened 2 weeks ago
I have already tried the "prefer sysmem fallback" option in the Nvidia Control Panel, and yet I still get OOM when Invoke tries to load Q8 Flux.
same hereThe same thing happens to me. I'm also using the same GPU. On Forge, I can load the Q8.GGUF+t5xxl_fp16 (it gets a bit slow and freezes the first time I load it), but after that, it works fine and generates the images without any issues.
@thiagojramos yeah, I use Q8 GGUF and t5xxl_fp16 in ComfyUI all the time without any memory issues. Sometimes it gets slow, like if I have multiple LoRAs, but it never OOM.
I have the same problem. I managed to generate some few images but now I'm only receiving OOM. I prefer the model parts load/unload at each generation even taking more time than not being able to generate at all.
Same here. Working on a 4060TI with 16GB. Flux FP8 is working very fine in Forge and ComfyUI. In Invoke AI the PC freezes or it needs very long time or just crashs. Here also the memory utilistation seems to be is out of control (32GB) (considering an upgrade to 64GB currently).
Is there an existing issue for this problem?
Operating system
Windows
GPU vendor
Nvidia (CUDA)
GPU model
RTX 3060
GPU VRAM
12GB
Version number
5.3
Browser
Edge 130.0.2849.46
Python dependencies
{ "accelerate": "0.30.1", "compel": "2.0.2", "cuda": "12.4", "diffusers": "0.27.2", "numpy": "1.26.4", "opencv": "4.9.0.80", "onnx": "1.16.1", "pillow": "11.0.0", "python": "3.10.6", "torch": "2.4.1+cu124", "torchvision": "0.19.1+cu124", "transformers": "4.41.1", "xformers": null }
What happened
I'd like to use Invoke, but with Q8 GGUF quantized Flux and the bnb int8 T5 encoder, I get out of memory on my 3060 12GB. I don't get OOM with Q8 Flux in ComfyUI or Auto1111/Forge (even though I know some of the model is being offloaded to RAM since the Q8 is 12.4GB). I have to step down to Q6 quant Flux in Invoke (or bnb-nf4), and I don't like doing that. Does Invoke need more work on memory optimizations, offloading parts of models to CPU RAM or shared GPU memory when they exceed VRAM? Or this an option that I need to enable somewhere?
What you expected to happen
No OOM.
How to reproduce the problem
No response
Additional context
No response
Discord username
No response