urchade / GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
https://arxiv.org/abs/2311.08526
Apache License 2.0
1.47k stars 126 forks source link

Using a fine tuned model #35

Closed IldebrandoSimeoni closed 7 months ago

IldebrandoSimeoni commented 8 months ago

Hello, I was trying to use a model that I finetuned via provided notebook, unfortunately when passing to the GLiNER.from_pretrained() method the local path of the folder containing the gliner_config.json and the pythorch_model.bin as results of the training process, the error HFValidationError arises, and I don't get how I should call the local repo in order to be callable. A new cell in the finetuning notebook showing how a finetuned model could be correctly called and used would be helpful, thanks

urchade commented 8 months ago

Hi, GLiNER.from_pretrainedwork well in my side. can you show your error message ?

chiranjeevbitm commented 8 months ago

Hi I am also facing same issue

code: PATH = 'pytorch_model.bin' model = GLiNER.from_pretrained(PATH) model.eval()

Error:

HTTPError Traceback (most recent call last) File c:\Users\chiranjeev.kumar.conda\envs\sancus_prod_env\lib\site-packages\huggingface_hub\utils_errors.py:304, in hf_raise_for_status(response, endpoint_name) 303 try: --> 304 response.raise_for_status() 305 except HTTPError as e:

File c:\Users\chiranjeev.kumar.conda\envs\sancus_prod_env\lib\site-packages\requests\models.py:1021, in Response.raise_for_status(self) 1020 if http_error_msg: -> 1021 raise HTTPError(http_error_msg, response=self)

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/pytorch_model.bin/resolve/main/pytorch_model.bin

The above exception was the direct cause of the following exception:

RepositoryNotFoundError Traceback (most recent call last) Cell In[37], line 2 1 PATH = 'pytorch_model.bin' ----> 2 model = GLiNER.from_pretrained(PATH) 3 model.eval()

File c:\Users\chiranjeev.kumar.conda\envs\sancus_prod_env\lib\site-packages\huggingface_hub\utils_validators.py:118, in validate_hf_hub_args.._inner_fn(*args, *kwargs) 115 if check_use_auth_token: 116 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name, has_token=has_token, kwargs=kwargs) --> 118 return fn(args, **kwargs) ...

Repository Not Found for url: https://huggingface.co/pytorch_model.bin/resolve/main/pytorch_model.bin. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated. Invalid username or password.

IldebrandoSimeoni commented 8 months ago

The main issue is the following one, in my case, after having generated the n folders after running the finetuning notebook (i.e. Finetuned_99, etc.) if I pass the path to the local folder (e.g. Finetuned_99) containing both the json and the pythorch model, the error repo_id arises, and I don't know how the local path should be passed to the .from_pretrained method

chiranjeevbitm commented 8 months ago

The main issue is the following one, in my case, after having generated the n folders after running the finetuning notebook (i.e. Finetuned_99, etc.) if I pass the path to the local folder (e.g. Finetuned_99) containing both the json and the pythorch model, the error repo_id arises, and I don't know how the local path should be passed to the .from_pretrained method

same for me

urchade commented 8 months ago

Try to to use a following format './log/model'

urchade commented 8 months ago

Hi I am also facing same issue

code: PATH = 'pytorch_model.bin' model = GLiNER.from_pretrained(PATH) model.eval()

Error:

HTTPError Traceback (most recent call last) File c:\Users\chiranjeev.kumar.conda\envs\sancus_prod_env\lib\site-packages\huggingface_hub\utils_errors.py:304, in hf_raise_for_status(response, endpoint_name) 303 try: --> 304 response.raise_for_status() 305 except HTTPError as e:

File c:\Users\chiranjeev.kumar.conda\envs\sancus_prod_env\lib\site-packages\requests\models.py:1021, in Response.raise_for_status(self) 1020 if http_error_msg: -> 1021 raise HTTPError(http_error_msg, response=self)

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/pytorch_model.bin/resolve/main/

The above exception was the direct cause of the following exception:

RepositoryNotFoundError Traceback (most recent call last) Cell In[37], line 2 1 PATH = 'pytorch_model.bin' ----> 2 model = GLiNER.from_pretrained(PATH) 3 model.eval()

File c:\Users\chiranjeev.kumar.conda\envs\sancus_prod_env\lib\site-packages\huggingface_hub\utils_validators.py:118, in validate_hf_hub_args.._inner_fn(*args, kwargs) 115 if check_use_auth_token: 116 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name*, has_token=has_token, kwargs=kwargs) --> 118 return fn(args, **kwargs) ...

Repository Not Found for url: https://huggingface.co/pytorch_model.bin/resolve/main/pytorch_model.bin. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated. Invalid username or password.

Path should point to the directory containing pytorch_model.bin

chiranjeevbitm commented 8 months ago

Hi I am also facing same issue code: PATH = 'pytorch_model.bin' model = GLiNER.from_pretrained(PATH) model.eval()

Error:

HTTPError Traceback (most recent call last) File c:\Users\chiranjeev.kumar.conda\envs\sancus_prod_env\lib\site-packages\huggingface_hub\utils_errors.py:304, in hf_raise_for_status(response, endpoint_name) 303 try: --> 304 response.raise_for_status() 305 except HTTPError as e: File c:\Users\chiranjeev.kumar.conda\envs\sancus_prod_env\lib\site-packages\requests\models.py:1021, in Response.raise_for_status(self) 1020 if http_error_msg: -> 1021 raise HTTPError(http_error_msg, response=self) HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/pytorch_model.bin/resolve/main/ The above exception was the direct cause of the following exception: RepositoryNotFoundError Traceback (most recent call last) Cell In[37], line 2 1 PATH = 'pytorch_model.bin' ----> 2 model = GLiNER.from_pretrained(PATH) 3 model.eval() File c:\Users\chiranjeev.kumar.conda\envs\sancus_prod_env\lib\site-packages\huggingface_hub\utils_validators.py:118, in validate_hf_hub_args.._inner_fn(*args, kwargs) 115 if check_use_auth_token: 116 kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.name*, has_token=has_token, kwargs=kwargs) --> 118 return fn(args, **kwargs) ... Repository Not Found for url: https://huggingface.co/pytorch_model.bin/resolve/main/pytorch_model.bin. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated. Invalid username or password.

Path should point to the directory containing pytorch_model.bin

image still getting error

urchade commented 8 months ago

Try '.\training_logs\finetuned_299'

urchade commented 8 months ago

GLiNER.from_pretrained(PATH, local_files_only=True)

chiranjeevbitm commented 8 months ago

GLiNER.from_pretrained(PATH, local_files_only=True)

tried it, still same error,

urchade commented 8 months ago

Then I don't know. Maybe @tomaarsen can help here

IldebrandoSimeoni commented 8 months ago

The following one is the error I keep receiving, I'm running on colab, idk if this info can be useful in some way, to me it seems like, even when passing a local folder the validation methods defined for hub repository are applied, so the local path won't pass those

HFValidationError Traceback (most recent call last) in <cell line: 3>() 3 if os.path.isdir("./training_logs/finetuned_99"): ----> 4 model = GLiNER.from_pretrained("./training_logs/finetuned_99", local_files_only = True) 5 else: 6 print("Repository does not exist")

4 frames /usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py in validate_repo_id(repo_id) 156 157 if repo_id.count("/") > 1: --> 158 raise HFValidationError( 159 "Repo id must be in the form 'repo_name' or 'namespace/repo_name':" 160 f" '{repo_id}'. Use repo_type argument if needed."

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': './training_logs/finetuned_99'. Use repo_type argument if needed.

MuhammadNafishZaldinanda commented 8 months ago

@IldebrandoSimeoni For load checkpoint finetuning model and use for predict

from gliner.model import load_config_as_namespace cls = GLiNER map_location = "cpu" strict = False

model_file = "/kaggle/working/logs/finetuned_2999/pytorch_model.bin" #Load model_file from local checkpoint finetuned config_file = "/kaggle/working/logs/finetuned_2999/gliner_config.json" #Load config file from local checkpoint finetuned config = load_config_as_namespace(config_file) model = cls(config) strict=False state_dict = torch.load(model_file, map_location=torch.device(map_location)) model.load_state_dict(state_dict, strict=strict, assign=True) model.to(map_location)

text = """ """

labels = ["Facility", "Organization", "Person", "Location", "Brand", "Disaster", "Disease", "Regulation", "Policy", "Event", "Food", "Drink", "Movie", "Complaint", "Book", "Music", "Date", "Time", "Equipment"]

entities = model.predict_entities(text, labels, threshold=0.5) print("Text", text) for entity in entities: print(entity["text"], "=>", entity["label"])

this worked for me