segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
758 stars 44 forks source link

KeyError: 'xlm-token' #137

Closed amurtadha closed 1 month ago

amurtadha commented 1 month ago

why in the config.json model_type is xlm-token

what are the tokenizer files. xlm-roberta ?

markus583 commented 1 month ago

We modify the XLM-R architecture in slight ways (internally, it is called xlm-token) but our models are built on XLM-R. So the tokenizer is the same.