urchade / GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
https://arxiv.org/abs/2311.08526
Apache License 2.0
1.48k stars 127 forks source link

How to load GLiNER model locally #40

Closed KevinEsh closed 8 months ago

KevinEsh commented 8 months ago

Hi guys, You did an incredible job with this model. It's incredible and hope to see further improvements over time. However, I noticed some issues which I'll document sooner.

First of all, How to load a model locally? I'm working on a Docker Dev container which I run every time I want to do any NER task. So downloading the GLiNER weights every time is not optimal. This is what I'm doing right now:

from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_multi")
model.save_pretrained("../models/gliner_multi")

Then when loading it as below, it raises an HTTPError

model.from_pretrained("../models/gliner_multi")
HTTPError                                 Traceback (most recent call last)
[...]

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/models/gliner_multi/resolve/main/pytorch_model.bin

The above exception was the direct cause of the following exception:

RepositoryNotFoundError                   Traceback (most recent call last)
Cell In[55], [line 1](vscode-notebook-cell:?execution_count=55&line=1)
----> [1](vscode-notebook-cell:?execution_count=55&line=1) model.from_pretrained("models/gliner_multi")

[....]

Repository Not Found for url: https://huggingface.co/models/gliner_multi/resolve/main/pytorch_model.bin.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

Seems like it's looking it up first on the web before looking in the path. So I wonder If I'm doing it in the proper way as there is no to much documentation out there. Is it possible to download the model and then load it locally?

Also this doesn't work:

model = GLiNER.from_pretrained("../models/gliner_multi", local_files_only=True)
config.json not found in /workspaces/cs-lcs-l3-mexico-sideprojects/models/gliner_multi
---------------------------------------------------------------------------
HFValidationError                         Traceback (most recent call last)
Cell In[60], line 1
----> 1 model.from_pretrained("../models/gliner_multi", local_files_only=True)

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    115 if check_use_auth_token:
    116     kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
--> 118 return fn(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/hub_mixin.py:277, in ModelHubMixin.from_pretrained(cls, pretrained_model_name_or_path, force_download, resume_download, proxies, token, cache_dir, local_files_only, revision, **model_kwargs)
    273     elif any(param.kind == inspect.Parameter.VAR_KEYWORD for param in init_parameters.values()):
    274         # If __init__ accepts **kwargs, let's forward the config as well (as a dict)
    275         model_kwargs["config"] = config
--> 277 instance = cls._from_pretrained(
    278     model_id=str(model_id),
    279     revision=revision,
    280     cache_dir=cache_dir,
    281     force_download=force_download,
    282     proxies=proxies,
    283     resume_download=resume_download,
    284     local_files_only=local_files_only,
    285     token=token,
    286     **model_kwargs,
    287 )
    289 # Implicitly set the config as instance attribute if not already set by the class
    290 # This way `config` will be available when calling `save_pretrained` or `push_to_hub`.
    291 if config is not None and instance.config is None:

File /opt/conda/lib/python3.10/site-packages/gliner/model.py:354, in GLiNER._from_pretrained(cls, model_id, revision, cache_dir, force_download, proxies, resume_download, local_files_only, token, map_location, strict, **model_kwargs)
    352 if not model_file.exists():
    353     try:
--> 354         model_file = hf_hub_download(
    355             repo_id=model_id,
    356             filename=filename,
    357             revision=revision,
    358             cache_dir=cache_dir,
    359             force_download=force_download,
    360             proxies=proxies,
    361             resume_download=resume_download,
    362             token=token,
    363             local_files_only=local_files_only,
    364         )
    365     except HfHubHTTPError:
    366         continue

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:110, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    105 for arg_name, arg_value in chain(
    106     zip(signature.parameters, args),  # Args values
    107     kwargs.items(),  # Kwargs values
    108 ):
    109     if arg_name in ["repo_id", "from_id", "to_id"]:
--> 110         validate_repo_id(arg_value)
    112     elif arg_name == "token" and arg_value is not None:
    113         has_token = True

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:158, in validate_repo_id(repo_id)
    155     raise HFValidationError(f"Repo id must be a string, not {type(repo_id)}: '{repo_id}'.")
    157 if repo_id.count("/") > 1:
--> 158     raise HFValidationError(
    159         "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
    160         f" '{repo_id}'. Use `repo_type` argument if needed."
    161     )
    163 if not REPO_ID_REGEX.match(repo_id):
    164     raise HFValidationError(
    165         "Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are"
    166         " forbidden, '-' and '.' cannot start or end the name, max length is 96:"
    167         f" '{repo_id}'."
    168     )

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '../models/gliner_multi'. Use `repo_type` argument if needed.

Weights are stored locally as expected Screenshot from 2024-03-22 17-39-35

GLiNER version = '0.1.3'

urchade commented 8 months ago

I really don't where this issues come, some people have it and other do not. Maybe try to update transformers version

KevinEsh commented 8 months ago

I'm using the latest sentence-transformers version 2.5.1 could you please share me the package's versions of your entire environment?

urchade commented 8 months ago

torch = ">=2.0.0" transformers = "^4.38.2" huggingface-hub = "^0.21.4" flair = "^0.13.1"

KevinEsh commented 8 months ago

I've identified the issue. Seems like HFValidationError is not been caught properly. I have corrected the mistake in a new branch but I don't have permissions to upload it. Change is minimal.

tomaarsen commented 8 months ago

@KevinEsh I believe that #42 should solve this issue also, I assume that your solution is similar?

KevinEsh commented 8 months ago

Hi @tomaarsen, Thanks for the suggestion. Yes, this was my problem although my solution was a little different.

[...]
from huggingface_hub.utils import HfHubHTTPError, HFValidationError

def _from_pretrained(
                        token=token,
                        local_files_only=local_files_only,
                    )
[...]
                except (HfHubHTTPError, HFValidationError):
                    continue

In this way I make sure to catch the right Exceptions without missing hidden bugs. May I have an ETA to when this fix will be released on v0.1.4? This change is important for my team otherwise we'll have to modify the inner source code on every dev environment of every team member.

Thanks in advance!

PS: Love your content, keep the good work.

chiranjeevbitm commented 8 months ago

PATH = ".\finetuned_999" model = GLiNER.from_pretrained(PATH,local_files_only = True) model.eval()

this worked for me