unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.29k stars 1.28k forks source link

[TEMP FIX] Ollama / llama.cpp: cannot find tokenizer merges in model file #1065

Open thackmann opened 1 month ago

thackmann commented 1 month ago

Thank you for developing this useful resource. The Ollama notebook reports

{"error":"llama runner process has terminated: error loading modelvocabulary: cannot find tokenizer merges in model file"}

This is the notebook with the error. It is a copy of the original notebook.

This seems similar to the issue reported in #1062.

laoc81 commented 1 month ago

Thank you for miraculous "unsloth"!! IT was working very well las week.

Now, i am having the same problem that @thackmann:

My notebook -> transformers 4.44.2 (the same last week).

Error: llama runner process has terminated: error loading model: error loading model vocabulary: cannot find tokenizer merges in model file

xmaayy commented 1 month ago

Same issue!

ThaisBarrosAlvim commented 1 month ago

Same issue!

kingabzpro commented 1 month ago

same issue.

Mukunda-Gogoi commented 1 month ago

facing similar issues is there a fix?? I m blocked!

Saber120 commented 1 month ago

same issue with llama3.2 3B , any solution please

shimmyshimmer commented 1 month ago

Hey guys working on a fix. The new transformers version kind of broke everything

adampetr commented 1 month ago

Same issue .. anyone have an idea where the problem is located.

kingabzpro commented 1 month ago

same issue with llama3.2 3B , any solution please

Yes. I tried to work around. Using llama.cpp but it didnt worked. The issues arise when we fine-tune and save the model.

williamzebrowskI commented 1 month ago

Same issue. Huge bummer - literally spent hours fine tuning and uploading to HF to get these error the past couple of days thinking it was me.

Franky-W commented 1 month ago

same issue here.

thank you @shimmyshimmer for working on the fix!

mahiatlinux commented 1 month ago

Hey guys. Yes, this is a current issue. But the boys are working to fix it. If you saved LORA, you might not have to rerun training.

williamzebrowskI commented 1 month ago

There is a workaround that was posted here and it worked for me.

https://github.com/unslothai/unsloth/issues/1062#issuecomment-2379161471

kingabzpro commented 1 month ago

There is a workaround that was posted here and it worked for me.

https://github.com/unslothai/unsloth/issues/1062#issuecomment-2379161471

This will not work for Llama 3.2 models.

gianmarcoalessio commented 1 month ago

same issue!!

David33706 commented 1 month ago

same issue

FotieMConstant commented 1 month ago

same issue here, any fix anyone?

here is the error i get aftery trying to run a ft model via ollama

Error: llama runner process has terminated: error loading modelvocabulary: cannot find tokenizer merges in model file
avvRobertoAlma commented 1 month ago

I have same issue with llama 3 llama.cpp error: 'error loading model vocabulary: cannot find tokenizer merges in model file '

danielhanchen commented 1 month ago

Apologies guys - was out for a few days and its been hectic, so sorry on the delay!! Will get to the bottom of fix and hopefully can fix it today! Sorry and thank you all for your patience!

danielhanchen commented 1 month ago

I can reproduce the error - in fact all of llama.cpp and thus Ollama etc do not work with transformers>=4.45.1 - I'll update everyone on a fix - it looks like HuggingFace's update most likely broke something in tokenizer exports

drsanta-1337 commented 1 month ago

@danielhanchen check this comment out, see if it helps.

https://github.com/huggingface/tokenizers/issues/1553#issuecomment-2243927115

danielhanchen commented 1 month ago

I just communicated with the Hugging Face team - they will upstream updates to llama.cpp later in the week. It seems like tokenizers>=0.20.0 is the culprit.

I re-uploaded all Llama-3.2 models and as a temporary fix, Unsloth will use transformers==4.44.2.

Please try again and see if it works! This unfortunately means you need to re-finetune the model if you did not save the 16bit merged HF weights or the LoRA weights - extreme apologiesnif you saved them, simply update Unsloth then reload them and convert to GGUF.

Update Unsloth via:

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

I will update everyone once the Hugging Face team resolves the issue! Sorry again!

Pinging everyone (and apologies for the issues and inconvenience again!!) @xmaayy @avvRobertoAlma @thackmann @kingabzpro @williamzebrowskI @FotieMConstant @laoc81 @gianmarcoalessio @ThaisBarrosAlvim @Franky-W @Saber120 @adampetr @David33706 @Mukunda-Gogoi

LysandreJik commented 1 month ago

Thanks @danielhanchen, and sorry for the disturbances; to give the context as to what is happening here, we updated the format of merges serialization in tokenizers to be much more flexible (this was done in this commit):

image

The change was done to be backwards-compatible : tokenizers and all libraries that depend on it will keep the ability to load merge files which were serialized in the old way.

However, it could not be forwards-compatible: if a file is serialized with the new format, older versions of tokenizers will not be able to load it.

This is why we're seeing this issue: new files are serialized using the new version, and these files are not loadable in llama.cpp, yet. We're updating all other codepaths (namely llama.cpp) to adapt to the new version. Once that is shipped, all your trained checkpoints will be directly loadable as usual. We're working with llama.cpp to ship this as fast as possible.

Thank you!

Issue tracker in llama.cpp: https://github.com/ggerganov/llama.cpp/issues/9692

danielhanchen commented 1 month ago

Sorry for the poor wording! Yep so if anyone has already saved the LoRA or 16bit weights (before converting to GGUF or ollama) you can reload it in Unsloth then save again after updating unsloth as a temporary solution as well.

Saber120 commented 1 month ago

I just communicated with the Hugging Face team - they will upstream updates to llama.cpp later in the week. It seems like tokenizers>=0.20.0 is the culprit.

I re-uploaded all Llama-3.2 models and as a temporary fix, Unsloth will use transformers==4.44.2.

Please try again and see if it works! This unfortunately means you need to re-finetune the model if you did not save the 16bit merged HF weights or the LoRA weights - extreme apologiesnif you saved them, simply update Unsloth then reload them and convert to GGUF.

Update Unsloth via:

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

I will update everyone once the Hugging Face team resolves the issue! Sorry again!

Pinging everyone (and apologies for the issues and inconvenience again!!) @xmaayy @avvRobertoAlma @thackmann @kingabzpro @williamzebrowskI @FotieMConstant @laoc81 @gianmarcoalessio @ThaisBarrosAlvim @Franky-W @Saber120 @adampetr @David33706 @Mukunda-Gogoi

Thank you for the update! I followed the steps you provided, and I’m happy to report that it worked perfectly on my end. I updated Unsloth, reloaded the saved weights, and successfully converted them to GGUF. Everything is running smoothly now with the transformers==4.44.2 fix.

I appreciate the quick re-upload and the detailed instructions. I’ll keep an eye out for the official update from Hugging Face, but for now, everything seems to be working great.

Thanks again for your efforts!

Best regards,

thackmann commented 1 month ago

Thank you @danielhanchen for the quick fix. The original notebook is now working.

kingabzpro commented 1 month ago

The fix is not working on Kaggle.

FotieMConstant commented 1 month ago

I just communicated with the Hugging Face team - they will upstream updates to llama.cpp later in the week. It seems like tokenizers>=0.20.0 is the culprit.

I re-uploaded all Llama-3.2 models and as a temporary fix, Unsloth will use transformers==4.44.2.

Please try again and see if it works! This unfortunately means you need to re-finetune the model if you did not save the 16bit merged HF weights or the LoRA weights - extreme apologiesnif you saved them, simply update Unsloth then reload them and convert to GGUF.

Update Unsloth via:

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

I will update everyone once the Hugging Face team resolves the issue! Sorry again!

Pinging everyone (and apologies for the issues and inconvenience again!!) @xmaayy @avvRobertoAlma @thackmann @kingabzpro @williamzebrowskI @FotieMConstant @laoc81 @gianmarcoalessio @ThaisBarrosAlvim @Franky-W @Saber120 @adampetr @David33706 @Mukunda-Gogoi

image

I get this error when i run the collab after applying the changes, seems to be an issue

danielhanchen commented 1 month ago

@kingabzpro I just updated pypi so pip install unsloth should have the temporary fixes - you might have to restart Kaggle

kingabzpro commented 1 month ago

@kingabzpro I just updated pypi so pip install unsloth should have the temporary fixes - you might have to restart Kaggle

It is working on Kaggle now. Thank you.