turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.54k stars 274 forks source link

(Oobabooga) Can't load GPTQ models anymore with ExLlama-V2 0.0.10 #211

Closed Daviljoe193 closed 10 months ago

Daviljoe193 commented 10 months ago

Pretty much as written in the title. I'm using my own personal Colab notebook to run Oobabooga, and while EXL2 models load just fine, GPTQ models specifically won't load properly, with the below error.

Traceback (most recent call last):
  File "/content/text-generation-webui/server.py", line 224, in <module>
    shared.model, shared.tokenizer = load_model(model_name)
  File "/content/text-generation-webui/modules/models.py", line 85, in load_model
    output = load_func_map[loader](model_name)
  File "/content/text-generation-webui/modules/models.py", line 364, in ExLlamav2_loader
    model, tokenizer = Exllamav2Model.from_pretrained(model_name)
  File "/content/text-generation-webui/modules/exllamav2.py", line 60, in from_pretrained
    tokenizer = ExLlamaV2Tokenizer(config)
  File "/usr/local/lib/python3.10/dist-packages/exllamav2/tokenizer.py", line 97, in __init__
    itp = self.tokenizer.decode([i])
  File "/usr/local/lib/python3.10/dist-packages/exllamav2/tokenizers/spm.py", line 37, in decode
    text = self.spm.decode(ids)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 780, in Decode
    return self._DecodeIds(input)
  File "/usr/local/lib/python3.10/dist-packages/sentencepiece/__init__.py", line 337, in _DecodeIds
    return _sentencepiece.SentencePieceProcessor__DecodeIds(self, ids)
IndexError: Out of range: piece id is out of range.

This error doesn't happen if I downgrade to ExLlama-V2 0.0.9, so this is likely some sort of regression between the two versions. My terrible workaround is simply to add !perl -i -pe 's|v0.0.10/exllamav2-0.0.10|v0.0.9/exllamav2-0.0.9|g' temp_requirements.txt to one of the cells prior to the pip stuff. That line needs to be commented out to recreate the issue without the workaround, though presumably this also happens on a local install the same way.

turboderp commented 10 months ago

This won't happen with all models, only some that have tokens that aren't control symbols but still can't be decoded by the tokenizer. This was necessary to support Deepseek models and to work around a bug in the Tokenizers library. There were some unintended side effects, but they should already be fixed. The fix just didn't make it into the 0.0.10 release.

Daviljoe193 commented 10 months ago

I'll make sure to keep an eye out for that release, and reopen the issue if the issue persists in that version.