huggingface / transformers-bloom-inference

Fast Inference Solutions for BLOOM
Apache License 2.0
561 stars 114 forks source link

Unable to reload a quantized model #85

Closed moonlightian closed 1 year ago

moonlightian commented 1 year ago

After setting load_in_8bit=True for .from_pretrained() function and get model quantized. How should I save this model and get it reloaded with .from_pretrained() again? And make all weights be loaded normally

moonlightian commented 1 year ago

and I found that after quantization, some weights endwith '.SCB' are stored into the state_dict which are not able to be loaded by AutoModelForCausalLM. I wonder how could I load these weights into the model, it seems that these weigts are related to Quantization

moonlightian commented 1 year ago

image after save_pretrained(), I found weights endwith '.SCB' in pytorch_modl.bin.index.json, which are not able to reload into AutoModelForCausalLM. How could I get rid of these problems?

mayank31398 commented 1 year ago

you are not supposed to save this model and reload again. load_in_8bit is supposed to be used on the fly.

moonlightian commented 1 year ago

you are not supposed to save this model and reload again. load_in_8bit is supposed to be used on the fly.

Thanks a lot for your reply,still looking forward for a method for reloading