Open LuRenJiasWorld opened 3 weeks ago
It appeared again when I tried to translate this issue XD
It seems like the corruption often occurred at the first Chinese character after a non-Chinese character, is tokenizers to be the cause?
Hi @LuRenJiasWorld, sorry for the delayed response. Do you mind providing a Python script that runs with MLCEngine and we can use to reproduce? That will be very helpful for identifying the problem.
facing same issue
My best guess is tokenizer.decode inside tvm has some issues
🐛 Bug
To Reproduce
Steps to reproduce the behavior:
When I use a Linux server with Nvidia L20 GPU, by using the same model, same application, same prompt, I could also reproduce this issue, but not as frequently as MacBook does.
Expected behavior
There should not have corrupted Unicode character when outputting Chinese, which is frustrating, makes me frequently guess what the word should be.