turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

MIT License

2.67k stars 214 forks source link

Hello. I noticed a couple of recent PRs added the encode_special_characters parameter option inside the tokenizer. This is great because right now I don't think exllama by default encodes special tokens and calls LLMs correctly.

However, for this change to be actually useful I think that functions such as generate_simple inside generator.py should also accept this parameter and send it along to the tokenizer, like so:

def generate_simple(self, prompt, max_new_tokens = 128, encode_special_characters=True):

Is this a reasonable request? I can probably also submit a PR if it's something people think is useful, although I am not Python programmer. But I did modify the function locally and it works fine.

195

197

199

Well, there's a whole discussion about the "correct" way to tokenize inputs to language models. According to the SP authors, it's not correct to pass control symbols to the encoder, which is why there's all that extra code in ExLlamaTokenizer (and in HF's AutoTokenizer) to thwart their efforts. I've gone back and forth on the best way to deal with it, and currently I'm not sure. Even with that, ExLlama still won't tokenize added tokens (beyond the 32000 in the standard Llama vocabulary), and as far as I know even HF doesn't do it correctly so it's not a simple matter at all.

At any rate, generate_simple was supposed to be just that, a simple way of getting some output out of the generator, not an omni-tool for handling more advanced cases like this. That's why it isn't used in the web UI and chatbot examples.

But I don't see any harm in adding an option to encode special characters in the prompt. I would let it default to False though, to minimize chances of regression. So I'll get on that later today.

turboderp / exllama

Modify generator.py > generate_simple to accept encode_special_characters? #243

195

197

199