Closed EricLBuehler closed 8 months ago
It uses the template processing save in the tokenizer file to add special tokens at the end / beginning depending on the function used. A good example is in codellama
Ok, thanks! Does this apply to the Rust API, too? I am developing candle_llm_dataset for Candle, and so I need to know this for a from_iter method.
Pretty sure it does yes 😉 See this flag. It's just not implemented the same way for slow tokenizers in transformers but should not be a problem. See the doc on template processors
Ok, great. Thanks!
As stated above, what does the parameter add_special_tokens do? Does it add bos/eos tokens? Thanks!