I'm using ExLLama with the Oobabooga text-generation UI. With the model: TheBloke_llama2_70b_chat_uncensored-GPTQ
The model works great, but using ExLLama as a loader the model talks to itself, generating it's own questions and answering them. This can be addressed with stop-strings, but apparently stop-strings are not supported in ExLLama?
ExLLama is faster and more stable than AutoGPTQ in my testing, but this one little issue is causing all kinds of problems.
I'm using ExLLama with the Oobabooga text-generation UI. With the model: TheBloke_llama2_70b_chat_uncensored-GPTQ
The model works great, but using ExLLama as a loader the model talks to itself, generating it's own questions and answering them. This can be addressed with stop-strings, but apparently stop-strings are not supported in ExLLama?
ExLLama is faster and more stable than AutoGPTQ in my testing, but this one little issue is causing all kinds of problems.