turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k stars 215 forks source link

stop-string support? #257

Open krypterro opened 1 year ago

krypterro commented 1 year ago

I'm using ExLLama with the Oobabooga text-generation UI. With the model: TheBloke_llama2_70b_chat_uncensored-GPTQ

The model works great, but using ExLLama as a loader the model talks to itself, generating it's own questions and answering them. This can be addressed with stop-strings, but apparently stop-strings are not supported in ExLLama?

ExLLama is faster and more stable than AutoGPTQ in my testing, but this one little issue is causing all kinds of problems.

Qubitium commented 1 year ago

You can layer a stop string feature on top exllama or any generator for that matter by:

  1. Convert your stop string into tokens.
  2. Look for the for loop in generate code that appends new token into result.
  3. Match the stop string tokens to the end of the result tokens and if match, stop.
krypterro commented 1 year ago

Excellent idea, I'll have to figure out how to stop the generation, but that shouldn't be too hard, thanks.