Open xiaogz opened 2 months ago
I used a q8 version of the clip that @Green-Sky uploaded on hugging face , i am not sure how this affects quality but it lowered ram usage when loading the model
Since the text encoder is running on cpu, the actual VRAM used is less than 4G, in the log you posted.
Sry I meant to clarify is it possible to reduce cpu RAM usage? VRAM usage is definitely under 4GB yes but RAM usage is quite high. Even with video memory sharing some load RAM usage is >8GB.
I think there are a couple of things here.
Since the text encoder is running on cpu, the actual VRAM used is less than 4G, in the log you posted.Since the text encoder is running on cpu, the actual VRAM used is less than 4G, in the log you posted.
I want to know if I can set it up by myself so that CLIP and T5 can run on VRAM?
Is it possible to reduce the memory usage from spiking to 15ish GB when doing text2img? I'm currently following this guide and using the default cat prompt on leejet's q4_k and q2_k flux schnell model. Same behaviour for his q2_k model. The guide's link to the vae safetensor is inaccessible for me as I'm not part of flux-dev but I used the official black-forest-labs vae matrix.
Memory can spike up to 15ish GB before settling at 6 or 4 GB.
Using
--vae-tiling
flag lowers the spike to 12.95 GB. I'm not aware of any other options to further reduce memory consumption though.For metal q2_k, I still see the 12.95 GB mem spike in activity monitor.
Similarly for cpu q2_k:
It would be great if memory usage tops out at under 8GB thanks! EDIT: More information:
I'm on commit 8847114abfd900898e78d0257f5f9086f2473601
cmake -G Ninja -DSD_METAL=ON .. && cmake --build .
(looks like release is the default)./bin/sd --vae ~/work/models/stable-diffusion/diffusion_pytorch_model.safetensors --clip_l ~/work/models/stable-diffusion/clip_l.safetensors --t5xxl ~/work/models/stable-diffusion/t5xxl_fp16.safetensors -p "a lovely cat holding a sign says 'flux.cpp'" --cfg-scale 1.0 --sampling-method euler -v --diffusion-model ~/work/models/stable-diffusion/flux1-schnell-q4_k.gguf