openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
24.55k stars 3.2k forks source link

Faster text tokenization #407

Open michael-p opened 9 months ago

michael-p commented 9 months ago

We've been working on a re-implementation of the text tokenizer in Rust, with bindings for Python, called instant-clip-tokenizer. In our benchmarks it is around 70x faster than the current Python implementation.

Are you interested in switching to this library instead of the tokenizer included in repository? If yes I'm happy to send in a PR!