Full model always needed?

mit-han-lab / nunchaku

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

https://hanlab.mit.edu/projects/svdquant

Apache License 2.0

351 stars 17 forks source link

Open samedii opened 4 days ago

samedii commented 4 days ago

Is the full model needed before adding the quantization? It would be nice if it wasn't but maybe it's hard to avoid.

At the moment the full model is downloaded when the pipeline is loading even thought I have already prepared the quantized model locally.

lmxyy commented 4 days ago

It is not necessary. We will avoid loading it in the next release.

Cannerd-Staff-Admin commented 4 days ago

Can we delete the original BFL flux models after the pipeline has loaded? Or are they required for some part of the quantization setup/execution?

samedii commented 2 days ago

I could close this but might be good to leave up for others who are wondering until the next release? :)

lmxyy commented 2 days ago

Can we delete the original BFL flux models after the pipeline has loaded? Or are they required for some part of the quantization setup/execution?

Probably not for now. Everytime you load the model, it will redownload the model again.

Cannerd-Staff-Admin commented 18 hours ago

Confirmed, yes it does :S Great thanks for your work! It's quite impressive!