xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.85k stars 134 forks source link

Runtime Optimization #86

Open aditya-y47 opened 1 year ago

aditya-y47 commented 1 year ago

Hey, first up, thank you for building and open sourcing such a great piece of work, I have been using INSTRUCTOR for some time now and I absolutely love it.

I'm planning on working generating embeddings for a large corpus of texts (In Million scale), I intend to schedule the embedding generation job as an aysnc-MQ based execution. Based on some of my initial estimates the run-time estimates are a bit on the higher side, I was hoping certain methods could be used to optimize the generation of embeddings. Some of them include.

  1. Inference on TensorRT
  2. Compile the underlying PyTorch model
    • I see that you folks use Sentence-transformers like implementation, so I am unsure if torch compile how it would work
  3. Using Kernel fusion / Custom kernels. etc

Are there any generally prescribed guidelines which would help me achieve these, is anyone here working on such optimizations?

hongjin-su commented 9 months ago

Yeah, INSTRUCTOR is highly similar to sentence-transformer in terms of the model architecture. Therefore, any optimization that applies to sentence-transformer models may also be applicable to the INSTRUCTOR models.

Recently, there have been some efforts in model quantization, which you may take as references: https://www.sbert.net/examples/training/distillation/README.html#quantization https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/distillation/model_quantization.py

Hope this helps!