Closed TapendraBaduwal closed 9 months ago
You fined tuned a quantized model or the original model?
Probably you fined tuned the original model so now you need to quantized.
Go to the llama.cpp repo and find the quantize folder. Find some Youtube videos if you need help.
It actually takes up around 600MB on disk and around 700MB during inference, with activations taken into account (https://huggingface.co/TinyLlama/TinyLlama-1.1B-python-v0.1/blob/main/ggml-model-q4_0.gguf). I will update the readme.
@jzhang38 After finetune model model_name = "TinyLlama/TinyLlama-1.1B-python-v0.1" on my own dataset with lora and SFTTrainer i got model size 2.05 , is this model size take 600-700 MB or how can we reduce model size upto 600-700MB ?
@TapendraBaduwal You can checkout llama.cpp
@jzhang38 Thank you . Also how can i apply best practice to continue training after loading from a lora checkpoint ? I want to train lora checkpoint adapter continue training with new dataset.
@TapendraBaduwal I recommend you to check https://github.com/OpenAccess-AI-Collective/axolotl
After fine-tuning the model, I obtained a 2.2 GB PyTorch model.bin file. Is it possible to reduce this model size to 550 MB, and if so, how and when can we achieve this?