Closed greenpau closed 1 year ago
Hi, You may try to cache the downloaded model in a local path before the test time, and load it later without Internet access.
from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-large', cache_folder="local_path")
The downloading only happens in the first time you run the script, and will not require Internet access from the second time.
Feel free to add any questions or comments!
I want to download it before since the cache might get deleted
Hi, you may specify any local path for the parameter cache_folder
that will not be deleted.
Is there anyway to load the model from the git clone? For instance, with AutoModel
or AutoTokenizer
I can do this:
$ git clone https://huggingface.co/hkunlp/instructor-large
Then in the code:
from transformers import AutoTokenizer
AutoTokenizer.from_pretrained('./instructor-large')
But this doesn't seem to work with the INSTRUCTOR
class. Any way to get this to work?
Hi, would you like to share your codes? The following works for me
from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('./instructor-large')
Strange. That's what I tried too. But I get this error:
load INSTRUCTOR_Transformer
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 415, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, 'v'.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 94, in __init__
modules = self._load_sbert_model(model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/InstructorEmbedding/instructor.py", line 474, in _load_sbert_model
module = module_class.load(os.path.join(model_path, module_config['path']))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/InstructorEmbedding/instructor.py", line 306, in load
return INSTRUCTOR_Transformer(model_name_or_path=input_path, **config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/InstructorEmbedding/instructor.py", line 240, in __init__
self._load_model(self.model_name_or_path, config, cache_dir, **model_args)
File "/opt/homebrew/lib/python3.11/site-packages/sentence_transformers/models/Transformer.py", line 47, in _load_model
self._load_t5_model(model_name_or_path, config, cache_dir)
File "/opt/homebrew/lib/python3.11/site-packages/sentence_transformers/models/Transformer.py", line 55, in _load_t5_model
self.auto_model = T5EncoderModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2429, in from_pretrained
state_dict = load_state_dict(resolved_archive_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_utils.py", line 420, in load_state_dict
raise OSError(
OSError: You seem to have cloned a repository without having git-lfs installed. Please install git-lfs and run `git lfs install` followed by `git lfs pull` in the folder you cloned.
I'm on Mac btw.
Have you installed git-lfs? You may also try to load the state dict and see whether there is error
I don't think I have git-lfs. Just have the usual git client. Is git-lfs a hard requirement for this use case?
When first importing the package, I noticed a number of downloads happening.
I am deploying the package in the environment where I don't have access to Internet.
Is there a way to download all the required files ahead of time and tell the
INSTRUCTOR
to use the files instead of downloading them at runtime?