hiyouga / LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
27.83k stars 3.42k forks source link

On merging the PT Lora Adaptor to base model it increased the size to almost double to that of orig base model. #3137

Closed amitagh closed 1 month ago

amitagh commented 3 months ago

On merging the PT Lora Adaptor to base model it increased the size to almost double to that of orig base model.

I am pre-training LLama2 for a non-Englist language. For this i expanded Tokenizer vocab from 32k to 50k, it is a slow Tokenizer. Orig Meta LLama2 is augmented with modified tokenizer. Model safetensors are of approx 14 GB. Post this i am doing pretraining with Lora. After merging the merged model has safetensor files of almost 28 GB (double of base model).

What should ideally be the Lora adaptor size. Looks like Lora adaptor itself is 16 GB+ while base model is 14GB. on Completing pretraining with Lora the output_dir will contain only the Lora adaptor right? Not the merged model. I am doing merge of adaptor+base model post pretraining.

Any idea why? Is modified tokenizer leading to this? But there are others who have done it and for them size hasnt increased.

Also is there is option to convert slow tokenizer to fast?

amitagh commented 3 months ago

Hi @hiyouga Any input on this?

Saw the above with gemma too. though after PT in the PT output it was showing adaptor files , after i execute merge the size of model after merge was 10 GB when the base gemma-2b model is 5GB.

amitagh commented 3 months ago

I dont see merge_lora_to_base_model() called to merge the model and the adaptor. For pretraining looks like trainer.train() generates the final model and not the pretraining. Can you please confirm? Below are the loggs i see at the end of pretraining which says 1 adaptor merged. 04/04/2024 07:18:28 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 04/04/2024 07:18:30 - INFO - llmtuner.model.adapter - Merged 1 adapter(s). 04/04/2024 07:18:30 - INFO - llmtuner.model.adapter - Loaded adapter(s): output_dir/lora/pretrain 04/04/2024 07:18:30 - INFO - llmtuner.model.loader - all params: 6893080576 [INFO|configuration_utils.py:471] 2024-04-04 07:18:30,943 >> Configuration saved in output_dir/merged_pt_model/config.json [INFO|configuration_utils.py:697] 2024-04-04 07:18:30,943 >> Configuration saved in output_dir/merged_pt_model/generation_config.json [INFO|modeling_utils.py:2482] 2024-04-04 07:18:37,186 >> The model is bigger than the maximum size per checkpoint (2GB) and is going to be split in 8 checkpoint shards. You can find where each parameters has been saved in the index located at output_dir/merged_pt_model/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2502] 2024-04-04 07:18:37,187 >> tokenizer config file saved in output_dir/merged_pt_model/tokenizer_config.json [INFO|tokenization_utils_base.py:2511] 2024-04-04 07:18:37,187 >> Special tokens file saved in output_dir/merged_pt_model/special_tokens_map.json azureuser@mllm:/mnt/resource_nvme/LLaMA-Factory$ ls output_dir/merged_pt_model/ config.json model-00002-of-00008.safetensors model-00005-of-00008.safetensors model-00008-of-00008.safetensors tokenizer.model generation_config.json model-00003-of-00008.safetensors model-00006-of-00008.safetensors model.safetensors.index.json tokenizer_config.json model-00001-of-00008.safetensors model-00004-of-00008.safetensors model-00007-of-00008.safetensors special_tokens_map.json

Please confirm. i am using lora rank 64 and lora alpha 128 as i am pretraining on non-english large corpus.

Even for finetuning with lora i didnt find anywhere merge_lora_to_base_model() called anywhere.

amitagh commented 3 months ago

This happens due to following config: lora_modules_to_save: embed_tokens lm_head

Without this it works fine. There was Peft Library issue for this but it still seems to be there. https://github.com/huggingface/trl/issues/1287

hiyouga commented 3 months ago

It may be because the embed tokens and lm head are saved in 32-bit precision, leading to an increase in the file size. You can merge the lora adapter using the latest code and see whether the file size is smaller or not

amitagh commented 3 months ago

Adaptor size of 7 GB is taking too long to merge. almost over 15 mins and still stuck in merging. had to kill it. most likely it generated FP32 version of adaptor. After pushing the adaptor to repo it was 3.5 GB. Is there any way to select FP16 during merge and even while generating adaptor? Not sure why it generated in FP32 when the base model is in FP16. ls -l output_dir/lora/pretrain total 6929588 -rw-rw-r-- 1 azureuser azureuser 5089 Apr 18 10:09 README.md -rw-rw-r-- 1 azureuser azureuser 754 Apr 18 10:09 adapter_config.json -rw-rw-r-- 1 azureuser azureuser 7091573568 Apr 18 10:09 adapter_model.safetensors -rw-rw-r-- 1 azureuser azureuser 165 Apr 18 10:09 all_results.json -rw-rw-r-- 1 azureuser azureuser 636 Apr 18 10:09 special_tokens_map.json -rw-rw-r-- 1 azureuser azureuser 4241003 Apr 18 10:09 tokenizer.model -rw-rw-r-- 1 azureuser azureuser 34052 Apr 18 10:09 tokenizer_config.json -rw-rw-r-- 1 azureuser azureuser 165 Apr 18 10:09 train_results.json -rw-rw-r-- 1 azureuser azureuser 246 Apr 18 10:09 trainer_log.jsonl -rw-rw-r-- 1 azureuser azureuser 735 Apr 18 10:09 trainer_state.json -rw-rw-r-- 1 azureuser azureuser 5048 Apr 18 10:09 training_args.bin

Below is the stack at kill: File "./src/export_model.py", line 9, in main() File "./src/export_model.py", line 5, in main export_model() File "/mnt/resource_nvme/LLaMA-Factory/src/llmtuner/train/tuner.py", line 57, in export_model model = load_model(tokenizer, model_args, finetuning_args) # must after fixing tokenizer to resize vocab File "/mnt/resource_nvme/LLaMA-Factory/src/llmtuner/model/loader.py", line 106, in load_model model = init_adapter(model, model_args, finetuning_args, is_trainable) File "/mnt/resource_nvme/LLaMA-Factory/src/llmtuner/model/adapter.py", line 116, in init_adapter model = model.merge_and_unload() File "/home/azureuser/.local/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 784, in merge_and_unload return self._unload_and_optionally_merge( File "/home/azureuser/.local/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 438, in _unload_and_optionally_merge target.merge(safe_merge=safe_merge, adapter_names=adapter_names) File "/home/azureuser/.local/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 413, in merge delta_weight = self.get_delta_weight(active_adapter) File "/home/azureuser/.local/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 473, in get_delta_weight output_tensor = transpose(weight_B @ weight_A, self.fan_in_fan_out) * self.scaling[adapter] KeyboardInterrupt