Closed violet17 closed 1 year ago
Hi, I have reproduced this issue.
Actually this is caused by do_sample
default to True
in model.stream_chat
, and when model is float16, do_sample=True
will cause gibberish output which is a known issue.
There are two ways to fix it:
for response, history in model.stream_chat(tokenizer, prompt, history=[],max_length=64):
to for response, history in model.stream_chat(tokenizer, prompt, history=[],max_length=64, do_sample=False):
model = model.half().to('xpu')
to model = model.to('xpu')
, in our test, use fp32 model can bring faster generation speed@rnwang04 Thank you very much .
@rnwang04 Thank you very much .
You are welcome : )
Hi, I test chatGLM2 with transformer int4 weights using model.stream_chat on A770 and get gibberish output.
test code:
output:
newer version output: