Open j0h0k0i0m opened 1 month ago
Hi! Just to clarify, is android app generating different tokens compared to python example or is it decoding tokens to string incorrectly?
Dear @vraspar
Hello.
It's an issue with token decoding. Unlike English, the Phi model contains only a few Korean tokens, so generating a single Korean character often requires a combination of multiple tokens.
I resolved the issue where the decoded string from a token wasn’t output by storing it and combining it with the next token before outputting. However, this is a temporary workaround.
In MainActivity.java
, I checked String tok
using isEmpty()
to decide whether to store or output it. I hope this can be helpful as a reference.
@j0h0k0i0m Could you share the prompt you're using with the phi3.5 model? And what you expect as the returned decoded string? cc @wenbingl
Hello. I am trying to run phi-3.5 ONNX on Android. I'm reaching out because I'm not sure how to resolve the issue related to token to string conversion. This occurs when I instruct it to output in a different language, and I confirm through the logs that the output is not being generated.
When I execute the phi 3.5 tokenizer in Python, the output is
안녕하세요!
, but the android output is안하세요!
. I want to decode three tokens([238, 136, 152]) to obtain the correct results. I would appreciate any guidance on how to achieve this. Thank you.