invoke-ai / InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.
https://invoke-ai.github.io/InvokeAI/
Apache License 2.0
22.95k stars 2.37k forks source link

[bug]: 5.0 release ignores quantization #6939

Open zethfoxster opened 5 hours ago

zethfoxster commented 5 hours ago

Is there an existing issue for this problem?

Operating system

Windows

GPU vendor

Nvidia (CUDA)

GPU model

rtx 4090

GPU VRAM

24g

Version number

5

Browser

chrome

Python dependencies

No response

What happened

loading fp8 models uses the same amount of vram as loading the full unquantized versions of flux. capping my 24gigs

What you expected to happen

it should run at about 20 gigs or less depending on which of the Q models I choose.

How to reproduce the problem

No response

Additional context

No response

Discord username

No response

LiJT commented 2 hours ago

image Yeah Because Invoke 5.0 cannot read the internal clip and t5 model inside the fp8 model..... The speed is painfuless slow now https://github.com/invoke-ai/InvokeAI/issues/6940