Open lucasjinreal opened 1 year ago
Hi, such cases are already handled, so it shouldn't happen. Can you please share the code, prompt and link to model you are using.
@marella Sure, this is the code:
llm = AutoModelForCausalLM.from_pretrained(m_f, gpu_layers=150)
conv = get_default_conv_template(args.conv_template)
history = []
while True:
qs = input('> ')
conv.append_message_single_turn(qs)
prompt = conv.get_prompt()
if args.debug:
print(prompt)
outputs = ''
for text in llm(prompt, stream=True):
print(text, end="", flush=True)
outputs += text
print()
if not args.bare:
if args.multi_turn:
history.append({"input": qs, "output": outputs})
else:
conv.clear()
regardless the template composing, the output can not decode Chinese and Japannesse characters only.
For instance, prompt: 背诵古诗静夜思
Can u help on what is the issue about?
After look at your code:
# Handle incomplete UTF-8 multi-byte characters.
incomplete += self.detokenize([token], decode=False)
complete, incomplete = utf8_split_incomplete(incomplete)
text += complete.decode(errors="ignore")
I think this is not Chinese characters issue on llama tokenizer, for Chinese and Japanesee some characters need 2 or more token to decode right string bytes, so your handling might actually not solving this situation, what do u think?
For reference: The ASCII character of UTF-8 only occupies one byte, which is more space-saving, but the UTF-8 encoding with more characters takes up 1/2 more space,
especially for Chinese, Japanese and Korean (CJK). For block text, most of them require three bytes.
Hello, for llama when decoding Chinese or Japanese characters, since one character mgith need 2 or more tokens to decode, so when streaming, the chunk returned one token decode result is wrong,
is there a way to resolve this?
llama.cpp actually didn't have this issue.