Runtime Optimization - Githubissues

xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Apache License 2.0

1.85k stars 134 forks source link

Hey, first up, thank you for building and open sourcing such a great piece of work, I have been using INSTRUCTOR for some time now and I absolutely love it.

I'm planning on working generating embeddings for a large corpus of texts (In Million scale), I intend to schedule the embedding generation job as an aysnc-MQ based execution. Based on some of my initial estimates the run-time estimates are a bit on the higher side, I was hoping certain methods could be used to optimize the generation of embeddings. Some of them include.

Inference on TensorRT
Compile the underlying PyTorch model
- I see that you folks use Sentence-transformers like implementation, so I am unsure if torch compile how it would work
Using Kernel fusion / Custom kernels. etc

Are there any generally prescribed guidelines which would help me achieve these, is anyone here working on such optimizations?

xlang-ai / instructor-embedding

Runtime Optimization #86