mcmonkeyprojects / SwarmUI

SwarmUI (formerly StableSwarmUI), A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.
MIT License
1.35k stars 97 forks source link

Add Stable Diffusion 3.5 Large bf16 support #396

Closed Ilya87 closed 18 hours ago

Ilya87 commented 1 day ago

Feature Idea

I downloaded Stable Diffusion 3.5 Large in bf16 https://civitai.com/models/882666/sd35-large-google-flan with builtin clips. I edited metadata of model to Stable Diffusion 3.5. Try to put to unet, Stable diffusion, diff backend #0 failed to load model with error: Model loader for sd35LargeGoogleFLAN_large3CLIPFLANBF16.safetensors didn't work - are you sure it has an architecture ID set properly? In command line I also see nonsence message

[Warning] Your system memory usage exceeded 98.2% just before the backend process failed. This might indicate a memory overload.
 [Warning] You appear to have a sufficient pagefile (96.9 GiB), so you might have too many background processes, or you might just be trying to run too much.
 [Warning] Consider closing background processes, or greatly expanding your pagefile size.
[Warning] Or, reduce the size of what you're trying to run. If you're running an FP16 or FP8 model, consider a quantized variant like GGUF Q4

I never had 96.9 GiB pagefile. Moreover, creator of checkpoint writes "The Full BF16 model runs at an amazing speed even on my 8GB card". My card is RTX 4070. 32 GB RAM. 10000 Megabytes pagefile.

Other

Windows 11 24H2

mcmonkey4eva commented 18 hours ago

The pagefile size detection I think is wrong (something weird in how the windows api does virtual memory). The memory usage error message regardless only appears if the process crashed, which is usually genuinely a memory error if there's no other error (memory overload errors for some reason fail silently, which is why I have the memory check and warning there). If there's a different error above it, that earlier error is the real problem.

Moreover, creator of checkpoint writes "The Full BF16 model runs at an amazing speed even on my 8GB card" This person is silly. SD 3.5 large is an 8B model, and BF16 is 16 bits aka 2 bytes per parameter, ie 8 times 2 is 16 billion bytes, ie 16 gigabytes of VRAM to fit the model at all (plus more space needed for activation data and all). The model does not fit in VRAM and must be offloaded to system memory, which means it can't run nearly as fast as a model that fits in VRAM. (Also the whole google flan insertion is a bit silly but that's a separate topic)

An FP8 version of the SD35Large model will offload too, but much less. A GGUF could fit entirely within your card, and is the best bet.

For models with multiple components built-in (Textencoders, vae, ...) you want Models/Stable-Diffusion

Also the architecture ID error means it didn't have an architecture ID in the model and wasn't able to autodetect it (possibly because they did weird things to the model format there). You can click the ☰ menu next to the model in the Models browser and click "Edit Metadata" and set the architecture ID to the correct value.