This pull request addresses two significant updates to the InstructorEmbedding functionality to improve compatibility and performance when using sentence-transformers version 3.0.1.
Changes Introduced:
Refactor: Added Missing Parameters to _load_sbert_model for Enhanced Compatibility
Parameters added:
local_files_only=False
model_kwargs=None
tokenizer_kwargs=None
config_kwargs=None
These parameters were missing and are now included to ensure compatibility with sentence-transformers 3.0.1.
Refactor: Updated tokenize Method's Padding Parameter Back to True
Changed the padding parameter from "max_length" back to True.
This modification addresses a significant performance bottleneck observed in encoding operations, particularly impacting softmax and linear layers.
The performance profile comparison underscores the benefit of this change.
There might have been a valid reason to choose padding="max_length" for better performance during parallelization. However, I did not test for this case and therefore cannot speak for it. If so, there should at least be a parameter to choose the padding strategy.
Testing:
The updated methods were tested using sentence-transformers 3.0.1 on a CPU with a batch size of 1 on a single sentence to validate the functional and performance improvements.
Please review the changes and let me know if any further adjustments are necessary.
Thank you for considering this request. I look forward to your feedback.
Summary
This pull request addresses two significant updates to the
InstructorEmbedding
functionality to improve compatibility and performance when using sentence-transformers version 3.0.1.Changes Introduced:
Refactor: Added Missing Parameters to
_load_sbert_model
for Enhanced Compatibilitylocal_files_only=False
model_kwargs=None
tokenizer_kwargs=None
config_kwargs=None
Refactor: Updated
tokenize
Method's Padding Parameter Back toTrue
padding
parameter from"max_length"
back toTrue
.Performance Comparison:
With
padding=True
:With
padding="max_length"
:Additional Note
Testing:
Please review the changes and let me know if any further adjustments are necessary.
Thank you for considering this request. I look forward to your feedback.