unitaryai / detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.
https://www.unitary.ai/
Apache License 2.0
935 stars 114 forks source link

OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' #99

Closed khushalt10 closed 8 months ago

khushalt10 commented 10 months ago

I've been facing this error recently and it is very random. it goes away after re-running the same code multiple times.

code:

from detoxify import Detoxify

results = Detoxify('original').predict('example text')

results = Detoxify('unbiased').predict(['example text 1','example text 2'])

results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])

model = Detoxify('original', device='cpu')

import pandas as pd

print(pd.DataFrame(results).round(5))

full error:

HTTPError                                 Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py:270, in hf_raise_for_status(response, endpoint_name)
    269 try:
--> 270     response.raise_for_status()
    271 except HTTPError as e:

File /opt/conda/lib/python3.10/site-packages/requests/models.py:1021, in Response.raise_for_status(self)
   1020 if http_error_msg:
-> 1021     raise HTTPError(http_error_msg, response=self)

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/config.json

The above exception was the direct cause of the following exception:

RepositoryNotFoundError                   Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:430, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    428 try:
    429     # Load from URL or cache if already cached
--> 430     resolved_file = hf_hub_download(
    431         path_or_repo_id,
    432         filename,
    433         subfolder=None if len(subfolder) == 0 else subfolder,
    434         repo_type=repo_type,
    435         revision=revision,
    436         cache_dir=cache_dir,
    437         user_agent=user_agent,
    438         force_download=force_download,
    439         proxies=proxies,
    440         resume_download=resume_download,
    441         token=token,
    442         local_files_only=local_files_only,
    443     )
    444 except GatedRepoError as e:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    116     kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
--> 118 return fn(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1374, in hf_hub_download(repo_id, filename, subfolder, repo_type, revision, library_name, library_version, cache_dir, local_dir, local_dir_use_symlinks, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout, endpoint)
   1372 elif isinstance(head_call_error, RepositoryNotFoundError) or isinstance(head_call_error, GatedRepoError):
   1373     # Repo not found => let's raise the actual error
-> 1374     raise head_call_error
   1375 else:
   1376     # Otherwise: most likely a connection issue or Hub downtime => let's warn the user

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1247, in hf_hub_download(repo_id, filename, subfolder, repo_type, revision, library_name, library_version, cache_dir, local_dir, local_dir_use_symlinks, user_agent, force_download, force_filename, proxies, etag_timeout, resume_download, token, local_files_only, legacy_cache_layout, endpoint)
   1246 try:
-> 1247     metadata = get_hf_file_metadata(
   1248         url=url,
   1249         token=token,
   1250         proxies=proxies,
   1251         timeout=etag_timeout,
   1252     )
   1253 except EntryNotFoundError as http_error:
   1254     # Cache the non-existence of the file and raise

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:118, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    116     kwargs = smoothly_deprecate_use_auth_token(fn_name=fn.__name__, has_token=has_token, kwargs=kwargs)
--> 118 return fn(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:1624, in get_hf_file_metadata(url, token, proxies, timeout)
   1623 # Retrieve metadata
-> 1624 r = _request_wrapper(
   1625     method="HEAD",
   1626     url=url,
   1627     headers=headers,
   1628     allow_redirects=False,
   1629     follow_relative_redirects=True,
   1630     proxies=proxies,
   1631     timeout=timeout,
   1632 )
   1633 hf_raise_for_status(r)

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:402, in _request_wrapper(method, url, follow_relative_redirects, **params)
    401 if follow_relative_redirects:
--> 402     response = _request_wrapper(
    403         method=method,
    404         url=url,
    405         follow_relative_redirects=False,
    406         **params,
    407     )
    409     # If redirection, we redirect only relative paths.
    410     # This is useful in case of a renamed repository.

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/file_download.py:426, in _request_wrapper(method, url, follow_relative_redirects, **params)
    425 response = get_session().request(method=method, url=url, **params)
--> 426 hf_raise_for_status(response)
    427 return response

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py:320, in hf_raise_for_status(response, endpoint_name)
    312     message = (
    313         f"{response.status_code} Client Error."
    314         + "\n\n"
   (...)
    318         " make sure you are authenticated."
    319     )
--> 320     raise RepositoryNotFoundError(message, response) from e
    322 elif response.status_code == 400:

RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-656f04ec-61da344f2ca8006b4e4c9ab3;3b28f972-833b-4663-9d45-4c4183a00127)

Repository Not Found for url: https://huggingface.co/None/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
Cell In[43], line 5
      1 from detoxify import Detoxify
      3 # each model takes in either a string or a list of strings
----> 5 results = Detoxify('original').predict('example text')
      7 results = Detoxify('unbiased').predict(['example text 1','example text 2'])
      9 results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])

File /opt/conda/lib/python3.10/site-packages/detoxify/detoxify.py:103, in Detoxify.__init__(self, model_type, checkpoint, device, huggingface_config_path)
    101 def __init__(self, model_type="original", checkpoint=PRETRAINED_MODEL, device="cpu", huggingface_config_path=None):
    102     super().__init__()
--> 103     self.model, self.tokenizer, self.class_names = load_checkpoint(
    104         model_type=model_type,
    105         checkpoint=checkpoint,
    106         device=device,
    107         huggingface_config_path=huggingface_config_path,
    108     )
    109     self.device = device
    110     self.model.to(self.device)

File /opt/conda/lib/python3.10/site-packages/detoxify/detoxify.py:56, in load_checkpoint(model_type, checkpoint, device, huggingface_config_path)
     50 change_names = {
     51     "toxic": "toxicity",
     52     "identity_hate": "identity_attack",
     53     "severe_toxic": "severe_toxicity",
     54 }
     55 class_names = [change_names.get(cl, cl) for cl in class_names]
---> 56 model, tokenizer = get_model_and_tokenizer(
     57     **loaded["config"]["arch"]["args"],
     58     state_dict=loaded["state_dict"],
     59     huggingface_config_path=huggingface_config_path,
     60 )
     62 return model, tokenizer, class_names

File /opt/conda/lib/python3.10/site-packages/detoxify/detoxify.py:20, in get_model_and_tokenizer(model_type, model_name, tokenizer_name, num_classes, state_dict, huggingface_config_path)
     16 def get_model_and_tokenizer(
     17     model_type, model_name, tokenizer_name, num_classes, state_dict, huggingface_config_path=None
     18 ):
     19     model_class = getattr(transformers, model_name)
---> 20     model = model_class.from_pretrained(
     21         pretrained_model_name_or_path=None,
     22         config=huggingface_config_path or model_type,
     23         num_labels=num_classes,
     24         state_dict=state_dict,
     25         local_files_only=huggingface_config_path is not None,
     26     )
     27     tokenizer = getattr(transformers, tokenizer_name).from_pretrained(
     28         huggingface_config_path or model_type,
     29         local_files_only=huggingface_config_path is not None,
     30         # TODO: may be needed to let it work with Kaggle competition
     31         # model_max_length=512,
     32     )
     34     return model, tokenizer

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:2600, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   2597 if commit_hash is None:
   2598     if not isinstance(config, PretrainedConfig):
   2599         # We make a call to the config file first (which may be absent) to get the commit hash as soon as possible
-> 2600         resolved_config_file = cached_file(
   2601             pretrained_model_name_or_path,
   2602             CONFIG_NAME,
   2603             cache_dir=cache_dir,
   2604             force_download=force_download,
   2605             resume_download=resume_download,
   2606             proxies=proxies,
   2607             local_files_only=local_files_only,
   2608             token=token,
   2609             revision=revision,
   2610             subfolder=subfolder,
   2611             _raise_exceptions_for_missing_entries=False,
   2612             _raise_exceptions_for_connection_errors=False,
   2613         )
   2614         commit_hash = extract_commit_hash(resolved_config_file, commit_hash)
   2615     else:

File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:451, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    445     raise EnvironmentError(
    446         "You are trying to access a gated repo.\nMake sure to request access at "
    447         f"[https://huggingface.co/{](https://huggingface.co/%7Bpath_or_repo_id)[path_or_repo_id](https://huggingface.co/%7Bpath_or_repo_id)} and pass a token having permission to this repo either "
    448         "by logging in with `huggingface-cli login` or by passing `token=<your_token>`."
    449     ) from e
    450 except RepositoryNotFoundError as e:
--> 451     raise EnvironmentError(
    452         f"{path_or_repo_id} is not a local folder and is not a valid model identifier "
    453         "listed on '[https://huggingface.co/models'\nIf](https://huggingface.co/models'/nIf) this is a private repository, make sure to pass a token "
    454         "having permission to this repo either by logging in with `huggingface-cli login` or by passing "
    455         "`token=<your_token>`"
    456     ) from e
    457 except RevisionNotFoundError as e:
    458     raise EnvironmentError(
    459         f"{revision} is not a valid git identifier (branch name, tag name or commit id) that exists "
    460         "for this model name. Check the model page at "
    461         f"'[https://huggingface.co/{](https://huggingface.co/%7Bpath_or_repo_id)[path_or_repo_id](https://huggingface.co/%7Bpath_or_repo_id)}' for available revisions."
    462     ) from e

OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

python version = 3.10 thanks in advance

ckalla-otto commented 9 months ago

I can confirm this error, we encounter the same issue.

laurahanu commented 8 months ago

This should be fixed now with the latest transformers version!

manueltonneau commented 6 months ago

Hi @laurahanu! I'm still getting the same error both with the latest transformers version at the time of your post (4.37.2) and the last version today (4.39.2). Please let me know how to fix this, thanks!