Quantized models issue - Githubissues

MarceloCorreiaData commented 7 months ago

Hello and thank you for the work.

I have been attempting to quantize the Mistral, T5, and Falcon models, I can finish the process and save them, but they do not seem to perform inference properly with the safetensors saved model after it is loaded (either loading in 4bit or 16bit). I suspect that I might have made some errors. The sequence I code: 1- Load the pretrained model and tokenizer 2- Configure LoraConfig 3- Run utils.replace_module() [ screnshot attached] 4 - Save pretrained model

Is that right? Besides, I didn't fine-tunned with PEFT yet, after quantization.

While I am familiar with performing PEFT/LoRA and believe I have correctly identified the target modules (referred to as "allow_name" in your documentation), I am struggling to find out the correct "block_name" for each model (I'm aware of the default allow name and block name lists and tried to investigate using the snippet: for name, param in pretrained_model.named_parameters(): print(name, param.shape, param.max(), param.mean(), param.requires_grad) I suspect some mistake on block names might be the root of the issue, but you can tell me best.

I would greatly appreciate your help in resolving this issue.

In addition, I have implemented the code in a notebook, following the quantize.py example. I am executing the Lora_Config before calling utils.replace_module, in line with the sequence in your file. Please let me know if there is any aspect of this process that I might be misunderstanding.

Thanks a lot once again.

yxli2123 commented 7 months ago

Hi, thanks for your interest in our work. We are currently integrating LoftQ initialization into PEFT package. I would suggest you use examples in https://github.com/yxli2123/peft/tree/loftq/examples/loftq_finetuning and install PEFT as pip install git+https://github.com/yxli2123/peft.git@loftq.

Feel free to reply back if you have further issues.

MarceloCorreiaData commented 7 months ago

I have a question: Is it compatible with other model architectures such as Mistral and Falcon CausalLMs (for instance, when applying the Llama 2 example), or should I only experiment with the three specific models and/or architectures that have been published?

yxli2123 commented 7 months ago

LoftQ is compatible to all transformer models: decoder-only, encoder-only, and encoder-decoder models. For Mistral, we provide 4-bit 64-rank models on https://huggingface.co/LoftQ/Mistral-7B-v0.1-4bit-64rank. For Falcon, we are working on it and will release it soon.

yxli2123 commented 6 months ago

Hi @MarceloCorreiaData, I will close this issue. If you have further questions, please feel free to re-open it.

yxli2123 / LoftQ

Quantized models issue #6