proxy issues with hugging face

zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks

https://privategpt.dev

Apache License 2.0

53.53k stars 7.2k forks source link

proxy issues with hugging face #1854

Open marouahamdi opened 4 months ago

marouahamdi commented 4 months ago

"I have proxy restrictions in the company. Therefore, when I try to work with PrivateGPT (even though I have already downloaded the LLM model locally but not the embedder's), I get this error: requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /mistralai/Mistral-7B-Instruct-v0.2/resolve/main/tokenizer_config.json (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))"), '(Request ID: daac27c7-b24f-4c19-a707-895d5061595f)')"

tim-roethig-db commented 4 months ago

If it is just about the tokenizer one fix is to download it somewhere you have access to huggingface and place it into the /models folder. It is just 2mb in size I think.

If your proxy is working in general I found that you:

Have to pass the proxies argument to the hf function downloading the tokenizer. Like this:

AutoTokenizer.from_pretrained(
pretrained_model_name_or_path=settings().llm.tokenizer,
cache_dir=models_cache_path,
proxies={"http": <your proxy>, "https": <your proxy>}
)

Overwrite the session creation in site-packages/huggingface_hub/utils/_http.py Like this:

def _default_backend_factory() -> requests.Session:
session = requests.Session()

session.proxies = {"http": <your proxy>, "https": <your proxy>}
session.verify = False

#session.mount("http://", UniqueRequestIdAdapter())
#session.mount("https://", UniqueRequestIdAdapter())
return session

marouahamdi commented 4 months ago

hello, Thanks for the answer, but the thing that i don't have access to to arg of the entreprise proxy ( restrictions on the use of HF) i have dowloaded the llm locally and i have normally in the cache folder the bge model, for the tokenizer i don't know.. i need to cut the connexion to HF

KansaiTraining commented 3 months ago

If it is just about the tokenizer one fix is to download it somewhere you have access to huggingface and place it into the /models folder. It is just 2mb in size I think.

If your proxy is working in general I found that you:

1. Have to pass the proxies argument to the hf function downloading the tokenizer.
   Like this:

AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path=settings().llm.tokenizer,
    cache_dir=models_cache_path,
    proxies={"http": <your proxy>, "https": <your proxy>}
)

2. Overwrite the session creation in site-packages/huggingface_hub/utils/_http.py
   Like this:

def _default_backend_factory() -> requests.Session:
    session = requests.Session()

    session.proxies = {"http": <your proxy>, "https": <your proxy>}
    session.verify = False

    #session.mount("http://", UniqueRequestIdAdapter())
    #session.mount("https://", UniqueRequestIdAdapter())
    return session

What exactly should I download?