unslothai / unsloth

Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
12.36k stars 800 forks source link

How load Loar from Hugging Face ? #596

Open a442509097 opened 1 month ago

a442509097 commented 1 month ago

My Colab has very limited runtime, So I used Kaggle to train Lora and uploaded it to Huggingface, then Colab load Lora form Huggingface

model.save_pretrained("/kaggle/working/outputs")  # Local saving
tokenizer.save_pretrained("/kaggle/working/outputs")

But when i use Colab prompt "Should have a model_type key in its config. json" , so i add "model_type": "llama" to config.json. then prompt "Your session crashed after using all available ROM." What step did I do wrong?

from transformers import AutoModel, AutoTokenizer

model_name = "temp123/lora_model"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
danielhanchen commented 1 month ago

Oh you need to use model.push_to_hub and not model.save_pretrained

a442509097 commented 1 month ago

Oh you need to use model.push_to_hub and not model.save_pretrained

I downloaded it from Kaggle and manually uploaded it to Huggingface , The problem I am currently facing is that the RAM will overflow due to "model = AutoModel.from_pretrained(model_name)" . Perhaps I can also manually upload files to overwrite the files in the 'lora_model' of Colab, but I don't know what point 'model = Llama3', what point 'model = Lora', what point 'model = Llama3 + Lora'

a442509097 commented 1 month ago

Even if colab has enough running time, the disk will be full. I think text-generation-webui -> .gguf+ Lora is the fastest solution at the moment. 😅

main: quantize time = 221050.45 ms
main:    total time = 221050.45 ms
Unsloth: Conversion completed! Output location: ./model-unsloth.Q8_0.gguf
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 6.72 out of 12.67 RAM for saving.
100%|██████████| 32/32 [00:48<00:00,  1.50s/it]
Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
Unsloth: Saving a442509097/tempMode/pytorch_model-00001-of-00004.bin...
Unsloth: Saving a442509097/tempMode/pytorch_model-00002-of-00004.bin...
Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 6.36 out of 12.67 RAM for saving.
  0%|          | 0/32 [00:01<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/torch/serialization.py](https://localhost:8080/#) in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization, _disable_byteorder_record)
    627         with _open_zipfile_writer(f) as opened_zipfile:
--> 628             _save(obj, opened_zipfile, pickle_module, pickle_protocol, _disable_byteorder_record)
    629             return

15 frames
RuntimeError: [enforce fail at inline_container.cc:764] . PytorchStreamWriter failed writing file data/22: file write failed

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
RuntimeError: [enforce fail at inline_container.cc:595] . unexpected pos 704676160 vs 704676048

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
RuntimeError: [enforce fail at inline_container.cc:764] . PytorchStreamWriter failed writing file data/0: file write failed

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/torch/serialization.py](https://localhost:8080/#) in __exit__(self, *args)
    473 
    474     def __exit__(self, *args) -> None:
--> 475         self.file_like.write_end_of_file()
    476         if self.file_stream is not None:
    477             self.file_stream.close()

RuntimeError: [enforce fail at inline_container.cc:595] . unexpected pos 576 vs 470
danielhanchen commented 4 weeks ago

Wait even Colab runs out of disk space? Ye GGUF LoRA can work if that helps