mlc-ai / tokenizers-cpp

Universal cross-platform tokenizers binding to HF and sentencepiece
Apache License 2.0
211 stars 47 forks source link

Allow hugginface tokenizer to pass arguments for add/skip special tokens #26

Closed Abhishek8394 closed 3 months ago

Abhishek8394 commented 3 months ago

Thank you for this wrapper! I would like to propose following changes to api, and am contributing the implementation too:

These changes would be backwards compatible. And users can use this by explicity initializing a HFTokenizer object or casting a Tokenizer* to HFTokenizer*, assuming it indeed is a HFTokenizer.

These changes will leave the Tokenizer interface untouched.

DreamGenX commented 1 month ago

As far as I can see, HFTokenizer declaration is not exposed in includes.