mlc-ai / tokenizers-cpp

Universal cross-platform tokenizers binding to HF and sentencepiece
Apache License 2.0
211 stars 47 forks source link

tokenizer for triton inference server #35

Open geraldstanje opened 1 month ago

geraldstanje commented 1 month ago

hi,

Can this be used with triton inference server for huggingface setfit (https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)?

here is what i currently do with python:

from transformers import AutoTokenizer

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

Thanks, Gerald