unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.21k stars 1.27k forks source link

Error name 'quantization' is not defined in unsloth_save_pretrained_gguf #117

Closed corticalstack closed 9 months ago

corticalstack commented 10 months ago

Hi,

Error when trying to save pretrained model to GGUF as per your example notebook GGUF conversion, for q4 quant, (https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing#scrollTo=FqfebeAdT073) (which I'm running locally, not colab). Error as follows:

NameError                                 Traceback (most recent call last)
Cell In[21], [line 11](vscode-notebook-cell:?execution_count=21&line=11)
      [8](vscode-notebook-cell:?execution_count=21&line=8) if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")
     [10](vscode-notebook-cell:?execution_count=21&line=10) # Save to q4_k_m GGUF
---> [11](vscode-notebook-cell:?execution_count=21&line=11) if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
     [12](vscode-notebook-cell:?execution_count=21&line=12) if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

File [~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:665](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:665), in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, push_to_hub, token, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
    [662](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:662) for _ in range(3):
    [663](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:663)     gc.collect()
--> [665](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:665) file_location = save_to_gguf(new_save_directory, quantization, makefile)
    [667](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:667) # And save to HF
    [668](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:668) if push_to_hub:

NameError: name 'quantization' is not defined

Please let me know if you need more info, thanks

danielhanchen commented 10 months ago

@corticalstack Much apologies - my fault - I fixed this quickly as part of a hotfix. If you can update Unsloth only (no dependency updates) that would be awesome:

pip install --upgrade git+https://github.com/unslothai/unsloth.git

Sorry again!

tmceld commented 10 months ago

Just wanted to +1 this, my line: model.save_pretrained_gguf(output_dir, tokenizer, quantization_method = "q4_k_m")

danielhanchen commented 10 months ago

@tmceld Ohh coincidence!!! :)) Ye updating via pip install --upgrade git+https://github.com/unslothai/unsloth.git will solve it - again sorry! (Actually just released another hotfix 1 second ago!!)

tmceld commented 10 months ago

@corticalstack Much apologies - my fault - I fixed this quickly as part of a hotfix. If you can update Unsloth only (no dependency updates) that would be awesome:

pip install --upgrade git+https://github.com/unslothai/unsloth.git

Sorry again!

Still getting same error - unfortuntately:

{'train_runtime': 72.9422, 'train_samples_per_second': 447.889, 'train_steps_per_second': 55.935, 'train_loss': 0.021644320631144093, 'epoch': 5.0}                                       
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4080/4080 [01:12<00:00, 55.94it/s]
Traceback (most recent call last):
  File "/home/toast/Developer/modelTrain/unsloth_training/step2_prepare_dataset.py", line 94, in <module>
    model.save_pretrained_gguf(output_dir, tokenizer, quantization_method = "q4_k_m")
  File "/home/toast/miniconda3/envs/modelTrain/lib/python3.10/site-packages/unsloth/save.py", line 652, in unsloth_save_pretrained_gguf
    del arguments["quantization"]
KeyError: 'quantization'
tmceld commented 10 months ago

(Actually just released another hotfix 1 second ago!!)

I just grabbed this and still getting it i'm afraid

danielhanchen commented 10 months ago

@tmceld Oh my - how about this:

pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

danielhanchen commented 10 months ago

Wait I'm very certain I fixed it - I retried it multiple times for eg https://huggingface.co/danielhanchen/gguf_4bit

danielhanchen commented 10 months ago

@tmceld Oh wait you might have to restart Python - if not, you can also reload Unsloth:

from importlib import reload
import unsloth
reload(unsloth)

Reload it in ur interactive session after you reinstall Unsloth - again sorry for the inconvience!

tmceld commented 10 months ago

no, i really appreciate the help - i have re-installed unsloth as per instructions - i HAVE NOT reloaded python just yet, but i do have a different error:

NotImplementedError: You are calling save_pretrained on a 4-bit converted model. This is currently not supported

Going to retry now by reloading python as per your instructions. Thanks again for your help

danielhanchen commented 10 months ago

@tmceld Oh that seems like ur Huggingface version is old - did you do merged_4bit? You can try upgrading HF to the latest if your goal is to upload a 4bit model: pip install --upgrade git+https://github.com/huggingface/transformers.git

But I suggest using merged_16bit for old transformers versions - actually I might patch save_pretrained to error out and say a better error message

danielhanchen commented 10 months ago

Also if you did merge_and_unload that is probably why it's happening as well due to the old* transformers version.

Likewise, after reinstalling transformers to the latest - reload it in ur interactive session

tmceld commented 10 months ago

Boom!

llama_model_quantize_internal: model size  =  2098.35 MB
llama_model_quantize_internal: quant size  =   636.18 MB

main: quantize time = 13103.27 ms
main:    total time = 13103.27 ms
Unsloth: Conversion completed! Output location: ./unsloth_outputs-unsloth.Q4_K_M.gguf

really amazing work, TY so much @danielhanchen

danielhanchen commented 10 months ago

OH YAY!!!!!! It works!!! :)))

corticalstack commented 10 months ago

Hey @danielhanchen thanks for the fix, however I'm still getting an issue:

[289/291] Writing tensor blk.31.attn_q.weight                   | size   4096 x   4096  | type F16  | T+   7
[290/291] Writing tensor blk.31.attn_v.weight                   | size   1024 x   4096  | type F16  | T+   7
[291/291] Writing tensor output_norm.weight                     | size   4096           | type F32  | T+   7
Wrote model-unsloth.F16.gguf
Unsloth: Conversion completed! Output location: [./model-unsloth.F16.gguf](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/model-unsloth.F16.gguf)
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This will take 20 minutes...
/bin/sh: 1: [./llama.cpp/quantize](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/llama.cpp/quantize): not found
Unsloth: Conversion completed! Output location: [./model-unsloth.Q4_K_M.gguf](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/model-unsloth.Q4_K_M.gguf) 

model-unsloth.F16.gguf gets created on the file system, as also indicated by the log above. However, despite the log indicating the Q4_K_M quant getting created, it does not - no model on file system, with error that the llama.cpp/quantize exe script cannot be found. Checking the llama.cpp you clone, indeed quantize is not there:

@tmceld Do you see the quantize script in llama.cpp that is cloned to your working directory?

danielhanchen commented 10 months ago

@corticalstack Oh ye it seems like llama.cpp's quantize file does not exist? That's very very weird

tmceld commented 10 months ago

@tmceld Do you see the quantize script in llama.cpp that is cloned to your working directory?

under ./llama.cpp/scripts i have a qnt-all.sh is this what you mean? I'm on ubuntu, so no exe's.

danielhanchen commented 10 months ago

I'll check on my side - it's possible my non blocking calls are not working as expected - ie I do make llama.cpp at the same time as saving the model, in order to save time

danielhanchen commented 10 months ago

Hm weird it seems like I do in fact have quantize as an executable

image

danielhanchen commented 10 months ago

oh wait @corticalstack if you can delete the llama.cpp folder, it might be compiled incorrectly - Unsloth will recompile it

corticalstack commented 10 months ago

Unfortunately just failed with error after deleting llama.cpp, and restarting ipynb kernel:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[12], [line 11](vscode-notebook-cell:?execution_count=12&line=11)
      [8](vscode-notebook-cell:?execution_count=12&line=8) if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")
     [10](vscode-notebook-cell:?execution_count=12&line=10) # Save to q4_k_m GGUF
---> [11](vscode-notebook-cell:?execution_count=12&line=11) if True: model.save_pretrained_gguf("model_q4_k_m_gguf", tokenizer, quantization_method = "q4_k_m")
     [12](vscode-notebook-cell:?execution_count=12&line=12) if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

File [~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:792](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:792), in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, push_to_hub, token, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
    [790](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:790)     git_clone.wait()
    [791](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:791)     makefile  = install_llama_cpp_make_non_blocking()
--> [792](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:792)     new_save_directory = unsloth_save_model(**arguments)
    [793](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:793)     python_install.wait()
    [794](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:794) else:

File [~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:115](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:115), in context_decorator.<locals>.decorate_context(*args, **kwargs)
    [112](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:112) @functools.wraps(func)
    [113](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:113) def decorate_context(*args, **kwargs):
    [114](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:114)     with ctx_factory():
--> [115](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:115)         return func(*args, **kwargs)

File [~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:339](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:339), in unsloth_save_model(model, tokenizer, save_directory, save_method, push_to_hub, token, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, use_temp_dir, commit_message, private, create_pr, revision, commit_description, tags, temporary_location, maximum_memory_usage)
    [337](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:337) proj = eval(f"layer.{item}")
    [338](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:338) name = f"model.layers.{j}.{item}.weight"
--> [339](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:339) W = _merge_lora(proj, name)
    [341](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:341) if (torch.cuda.memory_allocated() + W.nbytes) < max_vram:
    [342](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:342)     # Save to GPU memory
    [343](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:343)     state_dict[name] = W

File [~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:80](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:80), in _merge_lora(layer, name)
     [78](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:78) dtype = quant_state.dtype if type(quant_state) is not list else quant_state[2]
     [79](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:79) W = fast_dequantize(W, quant_state).to(torch.float32).t()
---> [80](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:80) sAB = (A.t().to(torch.float32) @ (s * B.t().to(torch.float32)))
     [81](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:81) W += sAB
     [82](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:82) if not torch.isfinite(W).all():

AttributeError: 'NoneType' object has no attribute 't'

error when executing: if True: model.save_pretrained_gguf("model_q4_k_m_gguf", tokenizer, quantization_method = "q4_k_m")

danielhanchen commented 10 months ago

@corticalstack OHH I'm assuming no LoRA weights were added with FastLanguageModel.get_peft_model? But I'll add a check to fix that - thanks!!

As a temporary solution, call get_peft_model, but don't run trainer.train - the LoRA weights are anyways initialized to 0s, so if you're looking to just convert a base model to GGUF, it wont affect the ouput

corticalstack commented 10 months ago

@danielhanchen got the q4_k_m saved successfully. Layman's insight into error follows:

Sequence of notebook run steps where error saving q4_k_m GGUF occurred

  1. FastLanguageModel.from_pretrained
  2. FastLanguageModel.get_peft_model
  3. alpaca_prompt =
  4. trainer = SFTTrainer
  5. trainer.train()
  6. inference with model.generate (batch)
  7. inference with TextStreamer
  8. model.save_pretrained("mistral_lora_model")
  9. if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
  10. if True: model.save_pretrained_gguf("model_q4_k_m_gguf", tokenizer, quantization_method = "q4_k_m")

Sequence of notebook run steps where NO error saving q4_k_m GGUF occurred

  1. FastLanguageModel.from_pretrained
  2. FastLanguageModel.get_peft_model
  3. alpaca_prompt =
  4. trainer = SFTTrainer
  5. trainer.train()
  6. if True: model.save_pretrained_gguf("model_q4_k_m_gguf", tokenizer, quantization_method = "q4_k_m")

So for the second run sequence, I skip inference and saving the LORA / merged, i.e. direct to q4_k_m GGUF save after training. Assume one of the save steps in the first sequence is resetting the LORA weights?

I assume you can replicate this in your example notebook if you save both LORA and merged before GGUF?

Look fwd to your more expert insight, thanks!

danielhanchen commented 10 months ago

@corticalstack OHHH if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",) is the culprit - it merges all LoRA weighs into 4bit - oh wait is this because you wanted a 4bit merge or did you think it was GGUF 4bit?

But thanks so so much for debugging Unsloth - highly appreciate it!!!

corticalstack commented 10 months ago

@danielhanchen I was running all the save variants as wanting to compare file sizes, tokens/s completion speed, serving by diff inference engines such as vllm & ooba.

danielhanchen commented 10 months ago

@corticalstack OHHH ok ok!! Interesting - on that note, do you any other conversions? Some suggested AWQ and GPTQ :)

corticalstack commented 10 months ago

@corticalstack OHHH ok ok!! Interesting - on that note, do you any other conversions? Some suggested AWQ and GPTQ :)

AWQ, GPTQ, and EXL2 for GPU inference

danielhanchen commented 9 months ago

@corticalstack I will be adding AWQ and GPTQ! Not sure on EXL2 though