Closed Richard7656 closed 1 year ago
At the tab of mode of oobabooga text-generation-webui, if choose use llamacpp_HF as the loader for GGML format of Language model. I notice a message:
_"llamacppHF is a wrapper that lets you use llama.cpp like a Transformers model, which means it can use the Transformers samplers. To use it, make sure to first download oobabooga/llama-tokenizer under “Download custom model or LoRA”."
So I download oobabooga/llama-tokenizer under “Download custom model or LoRA” and use llamacpp_HF as loader to load the GGML format of Language model. It is succussful loaded. When I input the specific Chinese character as show above. these specific Chinese character will first show as "��" and then turn into normal Chinese character.
Both the loader of llamacpp_HF and ctransformers can display these specific Chinese character correctly. I guess the problem that causing the bug of Specific Chinese characters display is due to the setting or config the llama.cpp.
I have test of a few number of language model which using the new format "GGUF" and set llama.cpp as the model loader. It seem that all these "GGUF" language model can display these specific Chinese characters correctly in response side. As the new format "GGUF" has replaced the old format "GGML" for llama.cpp, the bug can be seen as fixed.
Describe the bug
I use text-generation-webui and use llama.cpp as loader for the GGML format Language model.
Some of specific Chinese character can display correctly in input / prompt side. But these specific Chinese characters cannot be display correctly in output / response side.
I have try to use other language UI such as koboldcpp and gpt4all with the same GGML format Language model. But these specific Chinese characters can be display correctly in output / response side.
If the loader change from llama.cpp to ctransformers. the problem mention above will not occur.
I have also use another computer with Nvidia card and use Transformer as loader for the GPTQ format of the same Language model. The problem mention above does not occur too.
Example of GGML format Language model I have tried:
Llama-2-13B-GGML Llama-2-13B-chat-GGML Llama-2-7B-GGML Llama-2-7B-Chat-GGML WizardLM-13B-V1.2-GGML llama-2-13B-Guanaco-QLoRA-GGML vicuna-13B-v1.5-GGML vicuna-7B-v1.5-GGML StableBeluga-13B-GGML StableBeluga-7B-GGML Nous-Hermes-Llama2-GGML Nous-Hermes-Llama-2-7B-GGML
Is there an existing issue for this?
Reproduction
When I try to input the prompt with some Specific Chinese Characters, these specific Chinese characters cannot be display in response (There are only some example of Specific Chinese Characters which cannot be displayed. The actual quantity of specific Chinese characters which cannot be displayed is much more than the follow ) :
Prompt
Repeat the follow characters in output:
東亞、主權、印度、尼泊尔、不丹、緬甸、老撻、朝鮮、歷史、超過、統治、秦朝、經濟、企業、農業、生產、基礎、設施、開發、藝術、兵馬俑、丝绸之路、文化遗产、哲學、音樂、資源
Assistant
Repeat the follow characters in output:
東、主、度、、不、、老、朝、史、超、治、朝、經、業、業、生、基、設、開、術、兵馬、之路、文化、學、音、源
But if you order to translate these Chinese Characters into English, they can correctly translate:
Prompt
Translate the following Chinese into English:
東亞、主權、印度、尼泊尔、不丹、緬甸、老撻、朝鮮、歷史、超過、統治、秦朝、經濟、企業、農業、生產、基礎、設施、開發、藝術、兵馬俑、丝绸之路、文化遗产、哲學、音樂、資源
Assistant
Translate the following Chinese into English:
East Asia, sovereignty, India, Nepal, Bhutan, Myanmar, Laos, Korea, history, exceeded, governance, Qin Dynasty, economy, enterprise, agriculture, production, basis, facilities, development, art, terracotta warriors, Silk Road, cultural heritage, philosophy, music, resources
If you order to translate the above English back into Chinese, these specific Chinese characters still cannot display:
Prompt
Translate the following English into Chinese:
East Asia, sovereignty, India, Nepal, Bhutan, Myanmar, Laos, Korea, history, exceeded, governance, Qin Dynasty, economy, enterprise, agriculture, production, basis, facilities, development, art, terracotta warriors, Silk Road, cultural heritage, philosophy, music, resources
Assistant
Translate the following English into Chinese:
東、主、度、、不、、老、朝、史、超、治、朝、經、業、業、生、基、設、開、術、兵馬、之路、文化、學、音、源
Screenshot
Problem display in oobabooga text-generation-webui
Comparison with koboldcpp
Comparison with gpt4all
Logs
System Info