Closed Ubospica closed 6 months ago
let us directly call id_to_token, see related APIs
This would avoid the post processing done by the decode pipeline
for the rust binding, we can store the result string in the wrapper and reuse https://github.com/mlc-ai/tokenizers-cpp/blob/main/include/tokenizers_c.h#L31
std::string IdToToken(int32_t token_id);
cc @tqchen
cc @tqchen
This PR adds these methods to the Tokenizer class to support querying vocabulary from tokenizer. This supports downstream uses such as stopstring checking, grammar checking, etc.
Tokenizer build time: