huggingface / huggingface_hub

The official Python client for the Huggingface Hub.
https://huggingface.co/docs/huggingface_hub
Apache License 2.0
1.82k stars 470 forks source link

It seem that proxies doesn't pass to get the model info when use snapshot_download #2343

Open chinchilla-forest opened 2 weeks ago

chinchilla-forest commented 2 weeks ago

Describe the bug

my code is : # from huggingface_hub import snapshot_download snapshot_download( repo_id="meta-llama/Meta-Llama-3-8B", repo_type = None, local_dir=r"D:\model\qwen", token = "xxx", proxies={"https": "http://127.0.0.1:7890"}, max_workers=8 # )

because of some reason, my network can't connect to huggingface.co,my proxy can connect to. When I try this code to download some models,i got this error:

huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the files on the Hub and we cannot find the appropriate snapshot folder for the specified revision on the local disk. Please check your internet connection and try again.

From the code, I think when we call the function (HfApi.repo_info),and this functio will call HfApi.model_info.It finally will call requests.Session.get function. the code showed the proxies doesn't pass it request. `

headers = self._build_hf_headers(token=token) path = ( f"{self.endpoint}/api/models/{repo_id}" if revision is None else (f"{self.endpoint}/api/models/{repo_id}/revision/{quote(revision, safe='')}") ) params = {} if securityStatus: params["securityStatus"] = True if files_metadata: params["blobs"] = True r = get_session().get(path, headers=headers, timeout=timeout, params=params) # ` After I pass the proxies when get model info,it's works. can you fix this problem,or I can commit my code.

Reproduction

1.get a network that can't connect to huggingface.io 2.run a proxy server that can connect to huggingface.io.

  1. run the code ,download anything. # from huggingface_hub import snapshot_download snapshot_download( repo_id="meta-llama/Meta-Llama-3-8B", repo_type = None, local_dir=r"D:\model\qwen", token = "xxx", proxies={"https": "http://127.0.0.1:7890"}, max_workers=8 #

Logs

raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the files on the Hub and we cannot find the appropriate snapshot folder for the specified revision on the local disk. Please check your internet connection and try again.

System info

Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.23.3
- Platform: Windows-11-10.0.22631-SP0
- Python version: 3.12.3
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: C:\Users\lin\.cache\huggingface\token
- Has saved token ?: False
- Configured git credential helpers: manager
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.3.1+cu121
- Jinja2: 3.1.3
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: 10.2.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.26.3
- pydantic: 2.7.4
- aiohttp: 3.9.5
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: C:\Users\lin\.cache\huggingface\hub
- HF_ASSETS_CACHE: C:\Users\lin\.cache\huggingface\assets
- HF_TOKEN_PATH: C:\Users\lin\.cache\huggingface\token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10
Wauplin commented 2 weeks ago

This is indeed a bug, thanks for identifying and reporting it! Would you mind open a PR for it? I would simply add _proxies argument to repo_info/model_info/dataset_info/space_info without making it a public parameter and without adding it to HfApi either. It makes sense to support it for download methods (e.g. the main ones) but to support proxies in all methods, it's best to configure a custom HTTP backend :

import requests
from huggingface_hub import configure_http_backend, get_session

# Create a factory function that returns a Session with configured proxies
def backend_factory() -> requests.Session:
    session = requests.Session()
    session.proxies = {"http": "http://10.10.1.10:3128", "https": "https://10.10.1.11:1080"}
    return session

# Set it as the default session factory
configure_http_backend(backend_factory=backend_factory)
chinchilla-forest commented 1 week ago

I am very happy to submit this PR. Can I submit it this weekend? This way, I will have enough time to test the code.