Closed nigelzzz closed 3 weeks ago
This is expected. Sentencepiece doesn't have the knowledge that next_token is the word, and output_tokens are the sentence. The white-spaces between words are preserved in the decoded output.
Hi @taku910 , I got it, thanks, because i see other llm appilication decode token one by one. if i need to implement it, do you have any suggestion
Hi , when i decode token one by one, it can't show space, but when i decode token id vector, it can show space correctly
the output is like
Dear[Name],Ihopethisemailfindsyouwell.
but when i append token id to a vector, then decode it once, e.g.,
it can show
Dear [Name], I hope this email finds you well.