unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
17.21k stars 1.19k forks source link

Huggingface false F16 upload #802

Open JanDupont opened 3 months ago

JanDupont commented 3 months ago

When finetuning llama-3.1-8b or mistral-nemo-12b (only did those, doesn't seem to depend on the model), unsloth uploads the F16 result to huggingface too even tho my script should only upload the Q4_K_M: (the rest is pretty much very close to the collab script)

...
# Save to q4_k_m GGUF
if True: model.save_pretrained_gguf(result_dir + "model", tokenizer, quantization_method = "q4_k_m")
if True: model.push_to_hub_gguf(hf_orga_name + "/" + hf_repo_name, tokenizer, quantization_method = "q4_k_m", token = "XXXXXXXXXXX")

Running locally on a T4, installation done via conda using this as described in unsloth readme.md:

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

No big deal, it just slows down the process a bit.

danielhanchen commented 2 months ago

Oh that's a very good point!! I shall remove that - it does make stuff slower

PaolaShultz commented 2 weeks ago

Is there any progress on this? Only few or maybe one line needs to be deleted / commented... If I use rented rig with slower upload, it takes 30 mins extra without any need. Isn't one of main points of unsloth to make process faster, not slower? :)