Whenever I load up certain GGUFs, I get the above error message in the terminal. I have seen it happen on Bartowski Q8 quant of Llama3 70B Instruct (3-part file) and llama-3-70B-Instruct-abliterated-Q6_K-00001-of-00002.gguf.
Is there an existing issue for this?
[x] I have searched the existing issues
Reproduction
I cannot recall the URL of the quant page on Huggingface. I just know it is this one: llama-3-70B-Instruct-abliterated-Q6_K-00001-of-00002.gguf and llama-3-70B-Instruct-abliterated-Q6_K-00002-of-00002.gguf
Load it up in Oobabooga and send the LLM a message. You'll notice you get the following error message
/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda_tensorcores/llama.py:1054: RuntimeWarning: Detected duplicate leading "<|begin_of_text|>" in prompt, this will likely reduce response quality, consider removing it...
Describe the bug
Whenever I load up certain GGUFs, I get the above error message in the terminal. I have seen it happen on Bartowski Q8 quant of Llama3 70B Instruct (3-part file) and llama-3-70B-Instruct-abliterated-Q6_K-00001-of-00002.gguf.
Is there an existing issue for this?
Reproduction
I cannot recall the URL of the quant page on Huggingface. I just know it is this one: llama-3-70B-Instruct-abliterated-Q6_K-00001-of-00002.gguf and llama-3-70B-Instruct-abliterated-Q6_K-00002-of-00002.gguf
Load it up in Oobabooga and send the LLM a message. You'll notice you get the following error message
/text-generation-webui/installer_files/env/lib/python3.11/site-packages/llama_cpp_cuda_tensorcores/llama.py:1054: RuntimeWarning: Detected duplicate leading "<|begin_of_text|>" in prompt, this will likely reduce response quality, consider removing it...
Screenshot
No response
Logs
System Info