Update _load_sbert_model Parameters and Fix Tokenize Padding

Summary

This pull request addresses two significant updates to the InstructorEmbedding functionality to improve compatibility and performance when using sentence-transformers version 3.0.1.

Changes Introduced:

Refactor: Added Missing Parameters to _load_sbert_model for Enhanced Compatibility
- Parameters added:
  - local_files_only=False
  - model_kwargs=None
  - tokenizer_kwargs=None
  - config_kwargs=None
- These parameters were missing and are now included to ensure compatibility with sentence-transformers 3.0.1.
Refactor: Updated tokenize Method's Padding Parameter Back to True
- Changed the padding parameter from "max_length" back to True.
- This modification addresses a significant performance bottleneck observed in encoding operations, particularly impacting softmax and linear layers.
- The performance profile comparison underscores the benefit of this change.

Performance Comparison:

With padding=True:

ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
145     0.558    0.004    0.558    0.004    {built-in method torch._C._nn.linear}
24      0.083    0.003    0.440    0.018    .venv\lib\site-packages\transformers\models\t5\modeling_t5.py:279(forward)
24      0.041    0.002    0.041    0.002    {method 'softmax' of 'torch._C.TensorBase' objects}

With padding="max_length":

24      3.274    0.136    3.274    0.136    {method 'softmax' of 'torch._C.TensorBase' objects}
145     3.253    0.022    3.253    0.022    {built-in method torch._C._nn.linear}
24      0.625    0.026    5.479    0.228    .venv\lib\site-packages\transformers\models\t5\modeling_t5.py:445(forward)

Additional Note

There might have been a valid reason to choose padding="max_length" for better performance during parallelization. However, I did not test for this case and therefore cannot speak for it. If so, there should at least be a parameter to choose the padding strategy.

Testing:

The updated methods were tested using sentence-transformers 3.0.1 on a CPU with a batch size of 1 on a single sentence to validate the functional and performance improvements.

Please review the changes and let me know if any further adjustments are necessary.

Thank you for considering this request. I look forward to your feedback.

xlang-ai / instructor-embedding

Update _load_sbert_model Parameters and Fix Tokenize Padding #122

Summary

Changes Introduced:

Performance Comparison:

Additional Note

Testing: