xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.87k stars 135 forks source link

Updated to work with the latest version of sentence transformers #112

Closed SilasMarvin closed 7 months ago

SilasMarvin commented 8 months ago

The latest version of sentence transformers changed the arguments to the _load_sbert_model and changed the download system. This is a small update to make it compatible.

tomaarsen commented 7 months ago

cc @hongjin-su I haven't tested this myself, but this seems quite important to prevent the critical failures currently encountered when installing, e.g. #106.

For others, I would recommend the following:

pip install git+https://github.com/SilasMarvin/instructor-embedding.git@silas-update-for-newer-sentence-transformers

After which your code should work with the most recent Sentence Transformers version again.

BBC-Esq commented 7 months ago

Hello @SilasMarvin

First, thanks for this fork and thanks to @tomaarsen for pointing it out. I'd been trying to get instructor models to work with sentence-transformers library ever since they started updating it in January after not issuing a new release in a few years. Have you gotten response from the maintainers of instructorembedding? I've found them to be unresponsive and/or unhelpful. Apparently, their library was created as part of some college project in Washington State (my home state BTW) and presumably they've moved onto real jobs. ;-)

Regardless of the reason, I've asked sentence-transformers to update their code instead and perhaps you could help? This is not my expertise... Here's a link to our discussion. Again, thanks for this fix...

https://github.com/UKPLab/sentence-transformers/issues/2567

SilasMarvin commented 7 months ago

Hey! @BBC-Esq I have not gotten any response from the maintainers. This honestly is not the best fix. The new version of sentence-transformers has some nice utilities for loading in files and managing different versions which I did not implement, but this should work as a hot fix for people having issues. I'll take a look at the discussion!

BBC-Esq commented 7 months ago

Hey! @BBC-Esq I have not gotten any response from the maintainers. This honestly is not the best fix. The new version of sentence-transformers has some nice utilities for loading in files and managing different versions which I did not implement, but this should work as a hot fix for people having issues. I'll take a look at the discussion!

Please do take a look, and thanks again. I think you'll understand my challenge.

racinmat commented 7 months ago

This is great, I hope it could get merged.

hongjin-su commented 7 months ago

Thanks for all your help! I will merge it after conflicts are resolved.

SilasMarvin commented 7 months ago

Thanks for all your help! I will merge it after conflicts are resolved.

Awesome! Conflicts resolved

mjsonofharry commented 7 months ago

@hongjin-su will there be a new PyPi release soon?

Also for anyone who needs this fix in the meantime, you can pin this:

git+https://github.com/xlang-ai/instructor-embedding.git@5cca65eb0ed78ab354b086a5386fb2c528809caa
BBC-Esq commented 7 months ago

Please a new pypi release. These maintainers are slow to act.

racinmat commented 7 months ago

@SilasMarvin I just realized one thing: the https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L544 from which you took the code includes the https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L552-L554 too which lets you use the model locally without the connection to the internet, is there any reason why it's not included or why you haven't used the load_dir_path directly?

BBC-Esq commented 7 months ago

Is this why when I specify a path on my computer it gives an error saying it's not a correct repository ID on huggingface or what not?

BBC-Esq commented 7 months ago

It's not working, so i did this pull request to try and make it work. TO BE CLEAR, however, I don't think this addresses @racinmat 's message:

Here's my pull request:

https://github.com/xlang-ai/instructor-embedding/pull/113

BBC-Esq commented 7 months ago

My most recent pull request solves it

racinmat commented 7 months ago

yes, exactly, that's why it complains about the repository ID, because it went straight to the validation without prior check for local drive