Open davidkoski opened 8 months ago
eos
: +1, I saw your use in mlx-swift
and it makes sense. I'll open a PR soon.Opened https://github.com/huggingface/swift-transformers/pull/70 and https://github.com/ml-explore/mlx-swift-examples/pull/28 to address the first topic.
I have a couple of suggestions for the tokenizer API -- things that I have needed to work around here: https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Tokenizer.swift
add
eosToken
/eosTokenId
to theTokenizer
protocolTokenizer
already hasunknownToken
bosToken
have a way to add to
TokenizerModel/knownTokenizers
or otherwise handle unknown tokenizers"PreTrainedTokenizer": BPETokenizer.self
TokenizerModel
is internal as are the various classes likeBPETokenizer
"Qwen2Tokenizer": "PreTrainedTokenizer"
, which is perhaps the right level -- not exposing too much of the implementationIf these fit in with the vision for the tokenizer API, please consider them!
Thanks