turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.54k stars 274 forks source link

some GPTQ models can not be loaded anymore #222

Closed sammyf closed 9 months ago

sammyf commented 10 months ago

Caveat : I think it's a problem with exllamav2, but it might be a problem with tabbyAPI

I updated tabbyAPI and ExLllama2 and suddenly older models which used to work return an error with the tokenizer :

load_model with this payload

{
  "name": "TheBloke_AlpacaCielo2-7B-8K-GPTQ",
  "max_seq_len": 8192,
  "gpu_split_auto": true,
  "gpu_split": [
    0
  ],
  "rope_scale":  1,
  "rope_alpha": 1,
  "no_flash_attention": false,
  "low_mem": false
}

returns :


{
   "data":{
      "error":{
         "message":"Out of range: piece id is out of range.",
         "trace":"Traceback (most recent call last):\n  File \"/media/GLIMSPANKY/tabbyAPI/main.py\", line 118, in generator\n    for (module, modules) in load_status:\n  File \"/media/GLIMSPANKY/tabbyAPI/model.py\", line 187, in load_gen\n    self.tokenizer = ExLlamaV2Tokenizer(self.config)\n 

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  
    File \"/media/GLIMSPANKY/tabbyAPI/tby/lib/python3.11/site-packages/exllamav2/tokenizer.py\", line 97, in __init__\n
    itp = self.tokenizer.decode([i])\n          
^^^^^^^^^^^^^^^^^^^^^^^^^^\n  

   File \"/media/GLIMSPANKY/tabbyAPI/tby/lib/python3.11/site-packages/exllamav2/tokenizers/spm.py\", line 37, in decode\n
    text = self.spm.decode(ids)\n           
^^^^^^^^^^^^^^^^^^^^\n

  File \"/media/GLIMSPANKY/tabbyAPI/tby/lib/python3.11/site-packages/sentencepiece/__init__.py\", line 780, in Decode\n
    return self._DecodeIds(input)\n
^^^^^^^^^^^^^^^^^^^^^^\n

  File \"/media/GLIMSPANKY/tabbyAPI/tby/lib/python3.11/site-packages/sentencepiece/__init__.py\", line 337, in _DecodeIds\n 
   return _sentencepiece.SentencePieceProcessor__DecodeIds(self, ids)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n

IndexError: Out of range: piece id is out of range.\n"
      }
   }
}
Daviljoe193 commented 10 months ago

Basically the same as with me and MythoMax (GPTQ) on Ooba. It's specifically ExLlama-V2 0.0.10 that's the problem. It's a known issue, and will be fixed with 0.0.11. #211

Specifically quoting @turboderp

This won't happen with all models, only some that have tokens that aren't control symbols but still can't be decoded by the tokenizer. This was necessary to support Deepseek models and to work around a bug in the Tokenizers library. There were some unintended side effects, but they should already be fixed. The fix just didn't make it into the 0.0.10 release.

Daviljoe193 commented 9 months ago

Update on this issue. I tried Oobabooga with that same GPTQ quant of MythoMax, but with exllamav2 updated to 0.0.11 (Which was released just an hour ago), and I can confirm that the model now loads correctly, so I'd recommend you retest to see if your model loads fine now on the current version. If it does, then you should be fine closing this issue.

sammyf commented 9 months ago

Yeps .I can confirm everything now works again. Thanks for your work.