turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.28k stars 243 forks source link

About YiTokenizer errors #216

Closed redwoodzero0 closed 6 months ago

redwoodzero0 commented 7 months ago

When I try to quantize and run with Exl2, it won't run due to a YiTokenizer related error. Are there any plans for exl2 quantization and loader compatibility for models that use YiTokenizer? Screenshot_20231207_033130_Chrome

turboderp commented 7 months ago

ExLlamaV2 supports Yi models, but not the custom YiTokenizer. To the extent that the tokenization can be done by either SentencePiece or the Tokenizers library it should still be okay, and I have had Yi models running seemingly fine (outside of TGW at least.) There are some reports that I still have to get to of possible tokenization issues.

The error you're seeing there is not a tokenizer error, though, it's likely because you're running with a pretty old version of ExLlamaV2. Newer versions will be able to recognize the architecture and look for the ln1 and ln2 tensors that Yi models have instead of input_layernorm and post_attention_layernorm as Llama calls them.

turboderp commented 6 months ago

Cleaning up some stale issues.