unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.4k stars 1.04k forks source link

KeyError: 'EOS_TOKEN' when exporting GGUF with certain templates #868

Open Yandrik opened 1 month ago

Yandrik commented 1 month ago

I encountered an issue when trying to export a GGUF model file for Mistral Nemo and Mistral 7B finetunes using the unsloth library. The error occurs during the save_pretrained_gguf function call, specifically when creating the ollama_modelfile. The KeyError: '__EOS_TOKEN__' is raised, which crashes the process. This problem happens with the mistral and llama templates, but not with llama-3 or phi-3.

main: quantize time = 148980.08 ms
main:    total time = 148980.08 ms
Unsloth: Conversion completed! Output location: ./temptest/unsloth.Q4_K_M.gguf
Traceback (most recent call last):
  File "/<something/mistral-nemo-script/main.py", line 106, in <module>
    main()
  File "/<something>/mistral-nemo-script/main.py", line 81, in main
    save_gguf_quant(model, tokenizer)
  File "/<something>/mistral-nemo-script/fast_inference.py", line 53, in save_gguf_quant
    model.save_pretrained_gguf(name, tokenizer, quantization_method="q4_k_m", maximum_memory_usage=max_mem)
  File "/<something>/.local/lib/python3.9/site-packages/unsloth/save.py", line 1593, in unsloth_save_pretrained_gguf
    modelfile = create_ollama_modelfile(tokenizer, all_file_locations[0])
  File "/<something>/.local/lib/python3.9/site-packages/unsloth/save.py", line 1442, in create_ollama_modelfile
    modelfile = modelfile\
KeyError: '__EOS_TOKEN__'

Steps to Reproduce

  1. Load a Mistral model
  2. Set the chat template of the tokenizer to mistral (it seems like both mistral and llama templates result in the error).
  3. Attempt to export the model to GGUF using the model.save_pretrained_gguf() function. (we used quantization too, but it seems like the quant works fine, so that part is probably irrelevant)
  4. The process crashes with KeyError: '__EOS_TOKEN__' during the create_ollama_modelfile step.

Workaround

I attempted to set tokenizer._ollama_modelfile to None as a workaround, but this approach doesn't actually work consistently.

Environment

Additional Information

The issue does not occur with llama-3 or phi-3 templates. The quantization to Q4_K_M GGUF completes successfully; the problem lies in saving the modelfile.

danielhanchen commented 1 month ago

Hmm ok so an EOS token is missing - I'll check this - thanks for the report

zs856 commented 3 weeks ago

Last week, I updated unsloth to the latest version. Today I also encountered this problem.

danielhanchen commented 2 weeks ago

Yep working on a fix!

danielhanchen commented 1 week ago

Apologies - hopefully it's fixed now!