xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.87k stars 135 forks source link

Redundant code & Tokenize issue #118

Open JonathanZha47 opened 5 months ago

JonathanZha47 commented 5 months ago
  1. Seems like there's a redundant code here. image
  2. Tokenize method not working well for creating correct instruction_mask column image
JonathanZha47 commented 5 months ago

For the second problem, it appeared when I was using the 2.7.0 version of the sentence_transformers.

JonathanZha47 commented 5 months ago

If I'm using the 3.0.0 version of the sentence_transformers, then the local file issue still exists even if I change my code according to the #115 or #113

BBC-Esq commented 3 months ago

After you pip install the normal instructor package, try replacing the instructor.py file in the site-packages folder with the one in my fork here...let me know if this fixes the issue:

https://github.com/BBC-Esq/instructor-embedding