Closed corticalstack closed 9 months ago
@corticalstack Much apologies - my fault - I fixed this quickly as part of a hotfix. If you can update Unsloth only (no dependency updates) that would be awesome:
pip install --upgrade git+https://github.com/unslothai/unsloth.git
Sorry again!
Just wanted to +1 this, my line:
model.save_pretrained_gguf(output_dir, tokenizer, quantization_method = "q4_k_m")
@tmceld Ohh coincidence!!! :)) Ye updating via pip install --upgrade git+https://github.com/unslothai/unsloth.git
will solve it - again sorry! (Actually just released another hotfix 1 second ago!!)
@corticalstack Much apologies - my fault - I fixed this quickly as part of a hotfix. If you can update Unsloth only (no dependency updates) that would be awesome:
pip install --upgrade git+https://github.com/unslothai/unsloth.git
Sorry again!
Still getting same error - unfortuntately:
{'train_runtime': 72.9422, 'train_samples_per_second': 447.889, 'train_steps_per_second': 55.935, 'train_loss': 0.021644320631144093, 'epoch': 5.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4080/4080 [01:12<00:00, 55.94it/s]
Traceback (most recent call last):
File "/home/toast/Developer/modelTrain/unsloth_training/step2_prepare_dataset.py", line 94, in <module>
model.save_pretrained_gguf(output_dir, tokenizer, quantization_method = "q4_k_m")
File "/home/toast/miniconda3/envs/modelTrain/lib/python3.10/site-packages/unsloth/save.py", line 652, in unsloth_save_pretrained_gguf
del arguments["quantization"]
KeyError: 'quantization'
(Actually just released another hotfix 1 second ago!!)
I just grabbed this and still getting it i'm afraid
@tmceld Oh my - how about this:
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
Wait I'm very certain I fixed it - I retried it multiple times for eg https://huggingface.co/danielhanchen/gguf_4bit
@tmceld Oh wait you might have to restart Python - if not, you can also reload Unsloth:
from importlib import reload
import unsloth
reload(unsloth)
Reload it in ur interactive session after you reinstall Unsloth - again sorry for the inconvience!
no, i really appreciate the help - i have re-installed unsloth as per instructions - i HAVE NOT reloaded python just yet, but i do have a different error:
NotImplementedError: You are calling save_pretrained
on a 4-bit converted model. This is currently not supported
Going to retry now by reloading python as per your instructions. Thanks again for your help
@tmceld Oh that seems like ur Huggingface version is old - did you do merged_4bit
? You can try upgrading HF to the latest if your goal is to upload a 4bit model: pip install --upgrade git+https://github.com/huggingface/transformers.git
But I suggest using merged_16bit
for old transformers versions - actually I might patch save_pretrained
to error out and say a better error message
Also if you did merge_and_unload
that is probably why it's happening as well due to the old* transformers version.
Likewise, after reinstalling transformers to the latest - reload it in ur interactive session
Boom!
llama_model_quantize_internal: model size = 2098.35 MB
llama_model_quantize_internal: quant size = 636.18 MB
main: quantize time = 13103.27 ms
main: total time = 13103.27 ms
Unsloth: Conversion completed! Output location: ./unsloth_outputs-unsloth.Q4_K_M.gguf
really amazing work, TY so much @danielhanchen
OH YAY!!!!!! It works!!! :)))
Hey @danielhanchen thanks for the fix, however I'm still getting an issue:
[289/291] Writing tensor blk.31.attn_q.weight | size 4096 x 4096 | type F16 | T+ 7
[290/291] Writing tensor blk.31.attn_v.weight | size 1024 x 4096 | type F16 | T+ 7
[291/291] Writing tensor output_norm.weight | size 4096 | type F32 | T+ 7
Wrote model-unsloth.F16.gguf
Unsloth: Conversion completed! Output location: [./model-unsloth.F16.gguf](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/model-unsloth.F16.gguf)
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This will take 20 minutes...
/bin/sh: 1: [./llama.cpp/quantize](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/llama.cpp/quantize): not found
Unsloth: Conversion completed! Output location: [./model-unsloth.Q4_K_M.gguf](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/model-unsloth.Q4_K_M.gguf)
model-unsloth.F16.gguf gets created on the file system, as also indicated by the log above. However, despite the log indicating the Q4_K_M quant getting created, it does not - no model on file system, with error that the llama.cpp/quantize exe script cannot be found. Checking the llama.cpp you clone, indeed quantize is not there:
@tmceld Do you see the quantize script in llama.cpp that is cloned to your working directory?
@corticalstack Oh ye it seems like llama.cpp's quantize file does not exist? That's very very weird
@tmceld Do you see the quantize script in llama.cpp that is cloned to your working directory?
under ./llama.cpp/scripts
i have a qnt-all.sh
is this what you mean? I'm on ubuntu, so no exe's.
I'll check on my side - it's possible my non blocking calls are not working as expected - ie I do make llama.cpp
at the same time as saving the model, in order to save time
Hm weird it seems like I do in fact have quantize
as an executable
oh wait @corticalstack if you can delete the llama.cpp
folder, it might be compiled incorrectly - Unsloth will recompile it
Unfortunately just failed with error after deleting llama.cpp, and restarting ipynb kernel:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[12], [line 11](vscode-notebook-cell:?execution_count=12&line=11)
[8](vscode-notebook-cell:?execution_count=12&line=8) if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")
[10](vscode-notebook-cell:?execution_count=12&line=10) # Save to q4_k_m GGUF
---> [11](vscode-notebook-cell:?execution_count=12&line=11) if True: model.save_pretrained_gguf("model_q4_k_m_gguf", tokenizer, quantization_method = "q4_k_m")
[12](vscode-notebook-cell:?execution_count=12&line=12) if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")
File [~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:792](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:792), in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, push_to_hub, token, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
[790](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:790) git_clone.wait()
[791](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:791) makefile = install_llama_cpp_make_non_blocking()
--> [792](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:792) new_save_directory = unsloth_save_model(**arguments)
[793](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:793) python_install.wait()
[794](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:794) else:
File [~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:115](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:115), in context_decorator.<locals>.decorate_context(*args, **kwargs)
[112](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:112) @functools.wraps(func)
[113](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:113) def decorate_context(*args, **kwargs):
[114](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:114) with ctx_factory():
--> [115](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/torch/utils/_contextlib.py:115) return func(*args, **kwargs)
File [~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:339](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:339), in unsloth_save_model(model, tokenizer, save_directory, save_method, push_to_hub, token, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, use_temp_dir, commit_message, private, create_pr, revision, commit_description, tags, temporary_location, maximum_memory_usage)
[337](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:337) proj = eval(f"layer.{item}")
[338](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:338) name = f"model.layers.{j}.{item}.weight"
--> [339](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:339) W = _merge_lora(proj, name)
[341](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:341) if (torch.cuda.memory_allocated() + W.nbytes) < max_vram:
[342](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:342) # Save to GPU memory
[343](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:343) state_dict[name] = W
File [~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:80](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:80), in _merge_lora(layer, name)
[78](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:78) dtype = quant_state.dtype if type(quant_state) is not list else quant_state[2]
[79](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:79) W = fast_dequantize(W, quant_state).to(torch.float32).t()
---> [80](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:80) sAB = (A.t().to(torch.float32) @ (s * B.t().to(torch.float32)))
[81](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:81) W += sAB
[82](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a22506f65227d.vscode-resource.vscode-cdn.net/mnt/samssd/developments/genai-playground/language/fine-tuning/~/.conda/envs/genaiplayground/lib/python3.9/site-packages/unsloth/save.py:82) if not torch.isfinite(W).all():
AttributeError: 'NoneType' object has no attribute 't'
error when executing:
if True: model.save_pretrained_gguf("model_q4_k_m_gguf", tokenizer, quantization_method = "q4_k_m")
@corticalstack OHH I'm assuming no LoRA weights were added with FastLanguageModel.get_peft_model
? But I'll add a check to fix that - thanks!!
As a temporary solution, call get_peft_model
, but don't run trainer.train
- the LoRA weights are anyways initialized to 0s, so if you're looking to just convert a base model to GGUF, it wont affect the ouput
@danielhanchen got the q4_k_m saved successfully. Layman's insight into error follows:
So for the second run sequence, I skip inference and saving the LORA / merged, i.e. direct to q4_k_m GGUF save after training. Assume one of the save steps in the first sequence is resetting the LORA weights?
I assume you can replicate this in your example notebook if you save both LORA and merged before GGUF?
Look fwd to your more expert insight, thanks!
@corticalstack OHHH if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
is the culprit - it merges all LoRA weighs into 4bit - oh wait is this because you wanted a 4bit merge or did you think it was GGUF 4bit?
But thanks so so much for debugging Unsloth - highly appreciate it!!!
@danielhanchen I was running all the save variants as wanting to compare file sizes, tokens/s completion speed, serving by diff inference engines such as vllm & ooba.
@corticalstack OHHH ok ok!! Interesting - on that note, do you any other conversions? Some suggested AWQ and GPTQ :)
@corticalstack OHHH ok ok!! Interesting - on that note, do you any other conversions? Some suggested AWQ and GPTQ :)
AWQ, GPTQ, and EXL2 for GPU inference
@corticalstack I will be adding AWQ and GPTQ! Not sure on EXL2 though
Hi,
Error when trying to save pretrained model to GGUF as per your example notebook GGUF conversion, for q4 quant, (https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing#scrollTo=FqfebeAdT073) (which I'm running locally, not colab). Error as follows:
Please let me know if you need more info, thanks