salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.71k stars 396 forks source link

Is it possible to download model and use it locally from hugginface? #26

Closed lyriccoder closed 2 years ago

lyriccoder commented 2 years ago

Thanks for uploading the latest model for code summarization ( https://huggingface.co/Salesforce/codet5-base-multi-sum) I need to download the model (with wget) and then set it as a cache.

When I tried to use tokenizer by tokenizer = RobertaTokenizer.from_pretrained('Salesforce/codet5-base-multi-sum'), I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py", line 1654, in from_pretrained
    fast_tokenizer_file = get_fast_tokenizer_file(
  File "/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py", line 3486, in get_fast_tokenizer_file
    all_files = get_list_of_files(
  File "/usr/local/lib/python3.8/dist-packages/transformers/file_utils.py", line 2103, in get_list_of_files
    return list_repo_files(path_or_repo, revision=revision, token=token)
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/hf_api.py", line 602, in list_repo_files
    info = self.model_info(
  File "/usr/local/lib/python3.8/dist-packages/huggingface_hub/hf_api.py", line 585, in model_info
    r = requests.get(path, headers=headers, timeout=timeout)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 514, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/Salesforce/codet5-base-multi-sum (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)')))

I cannot use that way since I have a proxy and I can use only curl or wget. Each request inside python is blocked.

Could you please tell me how is it possible to download and use a cached model and tokenizer?

yuewang-cuhk commented 2 years ago

Hi, you can download the tokenizer and model from here: https://huggingface.co/Salesforce/codet5-base-multi-sum/tree/main. For example, you can obtain the model checkpoint using wget https://huggingface.co/Salesforce/codet5-base-multi-sum/blob/main/pytorch_model.bin.