xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.78k stars 131 forks source link

please accept this pull request ASAP PLEASE! #113

Open BBC-Esq opened 2 months ago

BBC-Esq commented 2 months ago

First change

Apparently during the last few pull requests the capitalization of a few things as well as the "_" character being omitted were made. This prevents it from working. For example...

The class was initially named INSTRUCTOR_Pooling

In a pull request it was changed to lowercase instead, and that "_" was removed in the same pull request or possibly another, I can't remember.

This pull request corrects the most recent pull request and classes are correctly named throughout the script. Here's a picture of the lines changed:

image

Second change

I corrected it to use the local path of specified...not sure if this is the PERFECT solution since the sentence-transformers library has a util.py script that does something similar, but THIS WORKS after many hours. Please merge:

image

racinmat commented 2 months ago

@BBC-Esq did you try to use the load_dir_path function from https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L552-L554 ? Because it's been there from version 2.3.0 and looks exactly like what is needed here.

racinmat commented 2 months ago

Now I understand the problem, the files https://huggingface.co/hkunlp/instructor-large/blob/main/config_sentence_transformers.json etc. need to be downloaded and the https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L559 does not let you match files in the root directory.

racinmat commented 2 months ago

I made PR https://github.com/BBC-Esq/instructor-embedding/pull/1 which mimicks the logic of load_dir_path while working well with instructor. I managed to make it run for both huggingface and local models.

BBC-Esq commented 2 months ago

@BBC-Esq did you try to use the load_dir_path function from https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L552-L554 ? Because it's been there from version 2.3.0 and looks exactly like what is needed here.

Couldn't figure out how to since I'm not a programmer by trade.

BBC-Esq commented 2 months ago

I made PR BBC-Esq#1 which mimicks the logic of load_dir_path while working well with instructor. I managed to make it run for both huggingface and local models.

Is this different than my pull request?

racinmat commented 2 months ago

It's a pull request to your pull request, extending the logic you added.

racinmat commented 2 months ago

looking at the changes you made, if you want to name the classes as they were in the released version in https://pypi.org/project/InstructorEmbedding/1.0.1/ which means most probably the commit https://github.com/xlang-ai/instructor-embedding/blob/619dda2bfbab22b3e623cb5893472eacae19f4a1/InstructorEmbedding/instructor.py then the name should be INSTRUCTOR_Pooling, not INSTRUCTORPooling, see https://github.com/xlang-ai/instructor-embedding/blob/619dda2bfbab22b3e623cb5893472eacae19f4a1/InstructorEmbedding/instructor.py#L23 and just for completeness: the change which renamed these classes, which I think is quite big change and should not have been merged, is https://github.com/xlang-ai/instructor-embedding/pull/92 classes should not be renamed such easily without some larger discussion I think. And that change to INSTRUCTOR_Pooling is in my PR to this PR.

BBC-Esq commented 2 months ago

Please take a few minutes to review this pull request and address @racinmat 's questions and modifications as well. If you can't find the time to do this, please transfer ownership of the repo to someone who has the motivation to do the bare minimum in maintaining it and publishing a new package to pypi. Thanks.

racinmat commented 2 months ago

@hongjin-su could you please merge this PR? or #115 ?