Open BBC-Esq opened 2 months ago
@BBC-Esq did you try to use the load_dir_path
function from https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L552-L554 ?
Because it's been there from version 2.3.0 and looks exactly like what is needed here.
Now I understand the problem, the files https://huggingface.co/hkunlp/instructor-large/blob/main/config_sentence_transformers.json
etc. need to be downloaded and the https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L559 does not let you match files in the root directory.
I made PR https://github.com/BBC-Esq/instructor-embedding/pull/1 which mimicks the logic of load_dir_path
while working well with instructor. I managed to make it run for both huggingface and local models.
@BBC-Esq did you try to use the
load_dir_path
function from https://github.com/UKPLab/sentence-transformers/blob/66e0ee30843dd411c64f37f65447bb38c7bf857a/sentence_transformers/util.py#L552-L554 ? Because it's been there from version 2.3.0 and looks exactly like what is needed here.
Couldn't figure out how to since I'm not a programmer by trade.
I made PR BBC-Esq#1 which mimicks the logic of
load_dir_path
while working well with instructor. I managed to make it run for both huggingface and local models.
Is this different than my pull request?
It's a pull request to your pull request, extending the logic you added.
looking at the changes you made, if you want to name the classes as they were in the released version in https://pypi.org/project/InstructorEmbedding/1.0.1/ which means most probably the commit https://github.com/xlang-ai/instructor-embedding/blob/619dda2bfbab22b3e623cb5893472eacae19f4a1/InstructorEmbedding/instructor.py then the name should be INSTRUCTOR_Pooling
, not INSTRUCTORPooling
, see https://github.com/xlang-ai/instructor-embedding/blob/619dda2bfbab22b3e623cb5893472eacae19f4a1/InstructorEmbedding/instructor.py#L23
and just for completeness: the change which renamed these classes, which I think is quite big change and should not have been merged, is https://github.com/xlang-ai/instructor-embedding/pull/92
classes should not be renamed such easily without some larger discussion I think.
And that change to INSTRUCTOR_Pooling
is in my PR to this PR.
Please take a few minutes to review this pull request and address @racinmat 's questions and modifications as well. If you can't find the time to do this, please transfer ownership of the repo to someone who has the motivation to do the bare minimum in maintaining it and publishing a new package to pypi. Thanks.
@hongjin-su could you please merge this PR? or #115 ?
First change
Apparently during the last few pull requests the capitalization of a few things as well as the "_" character being omitted were made. This prevents it from working. For example...
The class was initially named
INSTRUCTOR_Pooling
In a pull request it was changed to lowercase instead, and that "_" was removed in the same pull request or possibly another, I can't remember.
This pull request corrects the most recent pull request and classes are correctly named throughout the script. Here's a picture of the lines changed:
Second change
I corrected it to use the local path of specified...not sure if this is the PERFECT solution since the
sentence-transformers
library has autil.py
script that does something similar, but THIS WORKS after many hours. Please merge: