xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.87k stars 135 forks source link

Update _load_sbert_model Parameters and Fix Tokenize Padding #122

Open pascalhuerten opened 4 months ago

pascalhuerten commented 4 months ago

Summary

This pull request addresses two significant updates to the InstructorEmbedding functionality to improve compatibility and performance when using sentence-transformers version 3.0.1.

Changes Introduced:

  1. Refactor: Added Missing Parameters to _load_sbert_model for Enhanced Compatibility

    • Parameters added:
      • local_files_only=False
      • model_kwargs=None
      • tokenizer_kwargs=None
      • config_kwargs=None
    • These parameters were missing and are now included to ensure compatibility with sentence-transformers 3.0.1.
  2. Refactor: Updated tokenize Method's Padding Parameter Back to True

    • Changed the padding parameter from "max_length" back to True.
    • This modification addresses a significant performance bottleneck observed in encoding operations, particularly impacting softmax and linear layers.
    • The performance profile comparison underscores the benefit of this change.

Performance Comparison:

Additional Note

Testing:

Please review the changes and let me know if any further adjustments are necessary.

Thank you for considering this request. I look forward to your feedback.