Result is wrong when decoding tokens one by one

System Info

Node.js 22.4.0 @xenova/transformers 2.17.2

Environment/Platform

[ ] Website/web-app
[ ] Browser extension
[X] Server-side (e.g., Node.js, Deno, Bun)
[ ] Desktop app (e.g., Electron)
[ ] Other (e.g., VSCode extension)

Description

When decoding tokens which represents a multi-byte string, the result is wrong when decoding the tokens one by one.

import {StringDecoder} from 'node:string_decoder'
import {AutoTokenizer} from '@xenova/transformers'

const tokenizer = await AutoTokenizer.from_pretrained('Qwen/Qwen2-0.5B')

const tokens = [32, 13, 66521, 243, 28291]
console.log('Correct string:', tokenizer.decode(tokens))
console.log('Correct bytes:', Buffer.from(tokenizer.decode(tokens)))

const decoder = new StringDecoder('utf8')
let allBytes = []
process.stdout.write('\nWrong string: ')
for (const token of tokens) {
  const bytes = Buffer.from(tokenizer.decode([token]))
  allBytes.push(bytes)
  process.stdout.write(decoder.write(bytes))
}
process.stdout.write('\n')
console.log('Wrong bytes:', Buffer.concat(allBytes))

Reproduction

Running above script with Node and you can see the result:

Correct string: A. 单发
Correct bytes: <Buffer 41 2e 20 e5 8d 95 e5 8f 91>

Wrong string: A. ��发
Wrong bytes: <Buffer 41 2e 20 ef bf bd ef bf bd e5 8f 91>

I expect the bytes to be the same whether the tokens are decoded in one call, or decoded one by one.

This is probably intended results as a single token may be decoded into a partial unicode character. However this behavior makes it impossible to implement a correct streaming interface for LLMs, which I'm doing in my llm.js module.

xenova / transformers.js