qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ
Apache License 2.0
2.99k stars 459 forks source link

How to quantize bloom after lora/ptuning? #255

Open moonlightian opened 1 year ago

moonlightian commented 1 year ago

I finetuned bloom with loar and would like to quantize the model with GPTQ, ` self.model = AutoModelForCausalLM.from_pretrained( self.config['checkpoint_path'], device_map='auto', )

load adpater

self.model = PeftModelForCausalLM.from_pretrained(self.model, '/tmp/bloom_ori/lora_bloom')` some errors happened like: image It seems that after loading adapter, there are dimension error between alibi and attention_mask. How could I get rid of these bugs and quantize model with adapter?