Closed amitagh closed 1 month ago
Hi @hiyouga Any input on this?
Saw the above with gemma too. though after PT in the PT output it was showing adaptor files , after i execute merge the size of model after merge was 10 GB when the base gemma-2b model is 5GB.
I dont see merge_lora_to_base_model() called to merge the model and the adaptor. For pretraining looks like trainer.train() generates the final model and not the pretraining. Can you please confirm? Below are the loggs i see at the end of pretraining which says 1 adaptor merged. 04/04/2024 07:18:28 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 04/04/2024 07:18:30 - INFO - llmtuner.model.adapter - Merged 1 adapter(s). 04/04/2024 07:18:30 - INFO - llmtuner.model.adapter - Loaded adapter(s): output_dir/lora/pretrain 04/04/2024 07:18:30 - INFO - llmtuner.model.loader - all params: 6893080576 [INFO|configuration_utils.py:471] 2024-04-04 07:18:30,943 >> Configuration saved in output_dir/merged_pt_model/config.json [INFO|configuration_utils.py:697] 2024-04-04 07:18:30,943 >> Configuration saved in output_dir/merged_pt_model/generation_config.json [INFO|modeling_utils.py:2482] 2024-04-04 07:18:37,186 >> The model is bigger than the maximum size per checkpoint (2GB) and is going to be split in 8 checkpoint shards. You can find where each parameters has been saved in the index located at output_dir/merged_pt_model/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2502] 2024-04-04 07:18:37,187 >> tokenizer config file saved in output_dir/merged_pt_model/tokenizer_config.json [INFO|tokenization_utils_base.py:2511] 2024-04-04 07:18:37,187 >> Special tokens file saved in output_dir/merged_pt_model/special_tokens_map.json azureuser@mllm:/mnt/resource_nvme/LLaMA-Factory$ ls output_dir/merged_pt_model/ config.json model-00002-of-00008.safetensors model-00005-of-00008.safetensors model-00008-of-00008.safetensors tokenizer.model generation_config.json model-00003-of-00008.safetensors model-00006-of-00008.safetensors model.safetensors.index.json tokenizer_config.json model-00001-of-00008.safetensors model-00004-of-00008.safetensors model-00007-of-00008.safetensors special_tokens_map.json
Please confirm. i am using lora rank 64 and lora alpha 128 as i am pretraining on non-english large corpus.
Even for finetuning with lora i didnt find anywhere merge_lora_to_base_model() called anywhere.
This happens due to following config: lora_modules_to_save: embed_tokens lm_head
Without this it works fine. There was Peft Library issue for this but it still seems to be there. https://github.com/huggingface/trl/issues/1287
It may be because the embed tokens and lm head are saved in 32-bit precision, leading to an increase in the file size. You can merge the lora adapter using the latest code and see whether the file size is smaller or not
Adaptor size of 7 GB is taking too long to merge. almost over 15 mins and still stuck in merging. had to kill it. most likely it generated FP32 version of adaptor. After pushing the adaptor to repo it was 3.5 GB. Is there any way to select FP16 during merge and even while generating adaptor? Not sure why it generated in FP32 when the base model is in FP16. ls -l output_dir/lora/pretrain total 6929588 -rw-rw-r-- 1 azureuser azureuser 5089 Apr 18 10:09 README.md -rw-rw-r-- 1 azureuser azureuser 754 Apr 18 10:09 adapter_config.json -rw-rw-r-- 1 azureuser azureuser 7091573568 Apr 18 10:09 adapter_model.safetensors -rw-rw-r-- 1 azureuser azureuser 165 Apr 18 10:09 all_results.json -rw-rw-r-- 1 azureuser azureuser 636 Apr 18 10:09 special_tokens_map.json -rw-rw-r-- 1 azureuser azureuser 4241003 Apr 18 10:09 tokenizer.model -rw-rw-r-- 1 azureuser azureuser 34052 Apr 18 10:09 tokenizer_config.json -rw-rw-r-- 1 azureuser azureuser 165 Apr 18 10:09 train_results.json -rw-rw-r-- 1 azureuser azureuser 246 Apr 18 10:09 trainer_log.jsonl -rw-rw-r-- 1 azureuser azureuser 735 Apr 18 10:09 trainer_state.json -rw-rw-r-- 1 azureuser azureuser 5048 Apr 18 10:09 training_args.bin
Below is the stack at kill:
File "./src/export_model.py", line 9, in
On merging the PT Lora Adaptor to base model it increased the size to almost double to that of orig base model.
I am pre-training LLama2 for a non-Englist language. For this i expanded Tokenizer vocab from 32k to 50k, it is a slow Tokenizer. Orig Meta LLama2 is augmented with modified tokenizer. Model safetensors are of approx 14 GB. Post this i am doing pretraining with Lora. After merging the merged model has safetensor files of almost 28 GB (double of base model).
What should ideally be the Lora adaptor size. Looks like Lora adaptor itself is 16 GB+ while base model is 14GB. on Completing pretraining with Lora the output_dir will contain only the Lora adaptor right? Not the merged model. I am doing merge of adaptor+base model post pretraining.
Any idea why? Is modified tokenizer leading to this? But there are others who have done it and for them size hasnt increased.
Also is there is option to convert slow tokenizer to fast?