Open mmoskal opened 4 months ago
For example, the llama tokenizer has "<0x20>" as 35 and "▁" (space) as 29871, as well as "<0x21>" as 36 and "!" as 29991, etc.
We need to:
TokenSet
mostly done, need to call apply_duplicates() in more places in particular somewhere around return_logit_bias() and possibly after any user-level update to token set
apply_duplicates()
return_logit_bias()
For example, the llama tokenizer has "<0x20>" as 35 and "▁" (space) as 29871, as well as "<0x21>" as 36 and "!" as 29991, etc.
We need to:
TokenSet
(apply it after "compute_bias()" etc).