unslothai / unsloth

Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
12.37k stars 803 forks source link

save_pretrained_merged doesn't merge the model #611

Open neoneye opened 4 weeks ago

neoneye commented 4 weeks ago

Problem

My goal, I want to save the merged model as a GGUF file, but I'm getting various errors.

The deeper problem seems to be that merging lora+base model isn't saving a merged file.

I think I have successfully done the merging of lora+base model, around 7..14 days ago. Maybe it's something that have broken recently.

Details

My notebook google colab is based on unsloth/llama-3-8b-bnb-4bit and trained using unsloth colab notebook.

My model neoneye/base64-decode-v2-attempt12 contains the adapter_model.safetensors file. It does not contain the full merged model.

I can continue train on my model, and it loads the adapter + base model. So the loading of the LoRA and model is working. The training is working. The push_to_hub is working.

However merging the LoRA with the base model isn't working.

if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)

This is the output from save_pretrained_merged. There are no errors.

Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which will take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 5.7G

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 6.17 out of 12.67 RAM for saving.

 41%|████      | 13/32 [00:01<00:01, 13.39it/s]We will save to Disk and not RAM now.
100%|██████████| 32/32 [00:42<00:00,  1.33s/it]

Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...

/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py:399: FutureWarning: The `active_adapter` method is deprecated and will be removed in a future version.
  warnings.warn(

config.json: 100%
 1.20k/1.20k [00:00<00:00, 68.7kB/s]

Unsloth: Saving model/adapter_model.bin...
Done.

The biggest file is the lora file, 167mb. It seems like there is no merged file. I guess it should be generating a file around the same size as the base model or bigger, between 5..10 gb. But there is no such file. And no error about no file being generated.

/content/model# ls -la
total 172948
drwxr-xr-x 2 root root      4096 Jun  9 15:20 .
drwxr-xr-x 1 root root      4096 Jun  9 15:24 ..
-rw-r--r-- 1 root root       732 Jun  9 15:23 adapter_config.json
-rw-r--r-- 1 root root 167934026 Jun  9 15:23 adapter_model.bin
-rw-r--r-- 1 root root       172 Jun  9 15:23 generation_config.json
-rw-r--r-- 1 root root       464 Jun  9 15:23 special_tokens_map.json
-rw-r--r-- 1 root root     50614 Jun  9 15:23 tokenizer_config.json
-rw-r--r-- 1 root root   9085698 Jun  9 15:23 tokenizer.json

I'm on Google Colab with plenty of disk space.

Connected to
Python 3 Google Compute Engine backend (GPU)
RAM: 2.91 GB/12.67 GB
Disk: 29.36 GB/201.23 GB

Solution ideas

Am I correct that save_pretrained_merged should output a big merged file?

Inside save_pretrained_merged, check if the output file was generated, if there is no file then print an error.

danielhanchen commented 4 weeks ago

Weird hmm let me try it in colab

neph1 commented 4 weeks ago

Sounds like my issue: https://github.com/unslothai/unsloth/pull/609