xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
Apache License 2.0
9.71k stars 571 forks source link

Mistral Tokenizer V2 #779

Open DreamGenX opened 4 weeks ago

DreamGenX commented 4 weeks ago

Model description

The request is for Mistral Tokenizer V2, similar to your repo for V1 and V3 [1], but based on the V2 tokenizer data: https://github.com/mistralai/mistral-common/blob/main/src/mistral_common/data/mistral_instruct_tokenizer_240216.model.v2

This is tokenizer used by mistral-small-latest, mistral-large-latest.


Additional information


Your contribution

I am not familiar with the necessary ocnversion, and it would be great to have the package in the official "Xenova" repo.