zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
https://privategpt.dev
Apache License 2.0
53.53k stars 7.2k forks source link

proxy issues with hugging face #1854

Open marouahamdi opened 4 months ago

marouahamdi commented 4 months ago

"I have proxy restrictions in the company. Therefore, when I try to work with PrivateGPT (even though I have already downloaded the LLM model locally but not the embedder's), I get this error: requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /mistralai/Mistral-7B-Instruct-v0.2/resolve/main/tokenizer_config.json (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden')))"), '(Request ID: daac27c7-b24f-4c19-a707-895d5061595f)')"

tim-roethig-db commented 4 months ago

If it is just about the tokenizer one fix is to download it somewhere you have access to huggingface and place it into the /models folder. It is just 2mb in size I think.

If your proxy is working in general I found that you:

  1. Have to pass the proxies argument to the hf function downloading the tokenizer. Like this:
    AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path=settings().llm.tokenizer,
    cache_dir=models_cache_path,
    proxies={"http": <your proxy>, "https": <your proxy>}
    )
  2. Overwrite the session creation in site-packages/huggingface_hub/utils/_http.py Like this:

    def _default_backend_factory() -> requests.Session:
    session = requests.Session()
    
    session.proxies = {"http": <your proxy>, "https": <your proxy>}
    session.verify = False
    
    #session.mount("http://", UniqueRequestIdAdapter())
    #session.mount("https://", UniqueRequestIdAdapter())
    return session
marouahamdi commented 4 months ago

hello, Thanks for the answer, but the thing that i don't have access to to arg of the entreprise proxy ( restrictions on the use of HF) i have dowloaded the llm locally and i have normally in the cache folder the bge model, for the tokenizer i don't know.. i need to cut the connexion to HF

KansaiTraining commented 3 months ago

If it is just about the tokenizer one fix is to download it somewhere you have access to huggingface and place it into the /models folder. It is just 2mb in size I think.

If your proxy is working in general I found that you:

1. Have to pass the proxies argument to the hf function downloading the tokenizer.
   Like this:
AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path=settings().llm.tokenizer,
    cache_dir=models_cache_path,
    proxies={"http": <your proxy>, "https": <your proxy>}
)
2. Overwrite the session creation in site-packages/huggingface_hub/utils/_http.py
   Like this:
def _default_backend_factory() -> requests.Session:
    session = requests.Session()

    session.proxies = {"http": <your proxy>, "https": <your proxy>}
    session.verify = False

    #session.mount("http://", UniqueRequestIdAdapter())
    #session.mount("https://", UniqueRequestIdAdapter())
    return session

What exactly should I download?