I can continue train on my model, and it loads the adapter + base model. So the loading of the LoRA and model is working. The training is working. The push_to_hub is working.
However merging the LoRA with the base model isn't working.
if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
This is the output from save_pretrained_merged. There are no errors.
Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which will take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 5.7G
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 6.17 out of 12.67 RAM for saving.
41%|████ | 13/32 [00:01<00:01, 13.39it/s]We will save to Disk and not RAM now.
100%|██████████| 32/32 [00:42<00:00, 1.33s/it]
Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
/usr/local/lib/python3.10/dist-packages/transformers/integrations/peft.py:399: FutureWarning: The `active_adapter` method is deprecated and will be removed in a future version.
warnings.warn(
config.json: 100%
1.20k/1.20k [00:00<00:00, 68.7kB/s]
Unsloth: Saving model/adapter_model.bin...
Done.
The biggest file is the lora file, 167mb. It seems like there is no merged file. I guess it should be generating a file around the same size as the base model or bigger, between 5..10 gb. But there is no such file. And no error about no file being generated.
/content/model# ls -la
total 172948
drwxr-xr-x 2 root root 4096 Jun 9 15:20 .
drwxr-xr-x 1 root root 4096 Jun 9 15:24 ..
-rw-r--r-- 1 root root 732 Jun 9 15:23 adapter_config.json
-rw-r--r-- 1 root root 167934026 Jun 9 15:23 adapter_model.bin
-rw-r--r-- 1 root root 172 Jun 9 15:23 generation_config.json
-rw-r--r-- 1 root root 464 Jun 9 15:23 special_tokens_map.json
-rw-r--r-- 1 root root 50614 Jun 9 15:23 tokenizer_config.json
-rw-r--r-- 1 root root 9085698 Jun 9 15:23 tokenizer.json
I'm on Google Colab with plenty of disk space.
Connected to
Python 3 Google Compute Engine backend (GPU)
RAM: 2.91 GB/12.67 GB
Disk: 29.36 GB/201.23 GB
Solution ideas
Am I correct that save_pretrained_merged should output a big merged file?
Inside save_pretrained_merged, check if the output file was generated, if there is no file then print an error.
Problem
My goal, I want to save the merged model as a GGUF file, but I'm getting various errors.
The deeper problem seems to be that merging lora+base model isn't saving a merged file.
I think I have successfully done the merging of lora+base model, around 7..14 days ago. Maybe it's something that have broken recently.
Details
My notebook google colab is based on
unsloth/llama-3-8b-bnb-4bit
and trained using unsloth colab notebook.My model neoneye/base64-decode-v2-attempt12 contains the
adapter_model.safetensors
file. It does not contain the full merged model.I can continue train on my model, and it loads the adapter + base model. So the loading of the LoRA and model is working. The training is working. The
push_to_hub
is working.However merging the LoRA with the base model isn't working.
This is the output from
save_pretrained_merged
. There are no errors.The biggest file is the lora file, 167mb. It seems like there is no merged file. I guess it should be generating a file around the same size as the base model or bigger, between 5..10 gb. But there is no such file. And no error about no file being generated.
I'm on Google Colab with plenty of disk space.
Solution ideas
Am I correct that
save_pretrained_merged
should output a big merged file?Inside
save_pretrained_merged
, check if the output file was generated, if there is no file then print an error.