Closed bitnom closed 1 year ago
Hi, Thanks a lot for your interests in the INSTRUCTOR model!
Theoretically, we can increase the maximum length to a large number, but we have not tried the INSTRUCTOR model for documents with several thousand tokens. More use cases may be posted here for further discussion!
@hongjin-su How large of a number that we can increase the maximum sequence length? Can we increase it without re-training?
@jlia0 Thanks a lot for your comments!
You may increase the maximum sequence length a little bit without re-training, e.g., 768. However, if the sequence length is too long, you may experience low efficiency (as the transformer architecture requires O(n^2) time complexity) and slight performance drop.
Feel free to add any further questions or comments!
Feel free to re-open the issue if you have any further questions or comments!
Do you have any data on the performance given a range of input lengths? I'm working on neural search, and I came across instructor-xl as a potential replacement for text-embedding-ada-002, which has an context window of 8,191 tokens. Can instructor-xl handle that length without degrading? Any longer?
Issue 12 touched on this but didn't provide many details.
My immediate use is cosine similarity for search but I also have a need for clustering and categorization. Any info you can provide regarding the context length in relation to these use-cases will be super helpful and appreciated.
For anyone else reading this trying to compare the model to ada, here's a bit of discussion: https://github.com/UKPLab/sentence-transformers/issues/1897
and related benchmarking: https://huggingface.co/spaces/mteb/leaderboard