Open leemgs opened 2 years ago
Hi ! It looks like an issue with your python environment, can you make sure you're able to run GET requests to https://huggingface.co using requests
in python ?
Thanks for your reply. Does this mean that I have to use the do_dataset
function and the requests
function to download the dataset from the company's proxy environment?
Reference:
### How to load this dataset directly with the [datasets](https://github.com/huggingface/datasets) library
from datasets import load_dataset dataset = load_dataset("moyix/debian_csrc")
git lfs install git clone https://huggingface.co/datasets/moyix/debian_csrc
GIT_LFS_SKIP_SMUDGE=1
You can use requests
to see if downloading a file from the Hugging Face Hub works. If so, then datasets
should work as well. If not, then you have to find another way using an internet connection that works
I resolved this issue by applying to "unblock websites" at https://huggingface.com in a corporate network environment with a firewall.
Hi ! It looks like an issue with your python environment, can you make sure you're able to run GET requests to https://huggingface.co using
requests
in python ?
yes,but still not work
I read https://github.com/huggingface/datasets/blob/main/src/datasets/load.py, it fail when get the dataset metadata, so download_config has not worked.
hf_api = HfApi(config.HF_ENDPOINT)
try:
dataset_info = hf_api.dataset_info(
repo_id=path,
revision=revision,
token=download_config.token,
timeout=100.0,
)
except Exception as e: # noqa catch any exception of hf_hub and consider that the dataset doesn't exist
if isinstance(
e,
(
OfflineModeIsEnabled,
requests.exceptions.ConnectTimeout,
requests.exceptions.ConnectionError,
),
):
raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).__name__})")
I configure the huggingface_hub api, use configure_http_backend
from huggingface_hub import configure_http_backend
def backend_factory() -> requests.Session:
session = requests.Session()
session.proxies = proxy
session.verify = False
return session
configure_http_backend(backend_factory=backend_factory)
It works.
Even tough it does not look like a certificate error in the error message, I had the same error and adding following lines to my code solved my problem.
import os os.environ['CURL_CA_BUNDLE'] = ''
@kuikuikuizzZ Could you please explain where the configuration code is added?
Even tough it does not look like a certificate error in the error message, I had the same error and adding following lines to my code solved my problem.
import os os.environ['CURL_CA_BUNDLE'] = ''
Worked for as well! I faced the issue while submitting jobs through SLURM.
Even tough it does not look like a certificate error in the error message, I had the same error and adding following lines to my code solved my problem.
import os os.environ['CURL_CA_BUNDLE'] = ''
doesn't work , what does this code mean?
If you're working on a cluster, may be that they disabled remote connections for security purposes, you will have to download the files on your local machine and then transfer them to your cluster through scp or some other transfer protocol. I know you've probably resolved the issue, but that is for anyone in the future who might stumble across this thread and needs help because I struggled with that even after reading this thread.
Even tough it does not look like a certificate error in the error message, I had the same error and adding following lines to my code solved my problem.
import os os.environ['CURL_CA_BUNDLE'] = ''
If this not work, try this:
export http_proxy="http://127.0.0.1:10810"
export https_proxy="http://127.0.0.1:10810"
git config --global http.proxy http://127.0.0.1:10810
git config --global https.proxy http://127.0.0.1:10810
jupyter notebook
set your proxy env first, then start notebook in this session
If you're working on a cluster, may be that they disabled remote connections for security purposes, you will have to download the files on your local machine and then transfer them to your cluster through scp or some other transfer protocol. I know you've probably resolved the issue, but that is for anyone in the future who might stumble across this thread and needs help because I struggled with that even after reading this thread.
Thank you buddy!
Describe the bug
It's weird. I could not normally connect the dataset Hub of HuggingFace due to a SSLError in my office. Even when I try to connect using my company's proxy address (e.g., http_proxy and https_proxy), I'm getting the SSLError issue. What should I do to download the datanet stored in HuggingFace normally? I welcome any comments. I think those comments will be helpful to me.
real 0m7.742s user 0m4.930s
(deepspeed) geunsik-lim@ai02:~/qtlab$ ./test_debian_csrc_dataset.py Traceback (most recent call last): File "/data/home/geunsik-lim/qtlab/./test_debian_csrc_dataset.py", line 6, in
dataset = load_dataset("moyix/debian_csrc")
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1719, in load_dataset
builder_instance = load_dataset_builder(
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1497, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1222, in dataset_module_factory
raise e1 from None
File "/home/geunsik-lim/anaconda3/envs/deepspeed/lib/python3.10/site-packages/datasets/load.py", line 1179, in dataset_module_factory
raise ConnectionError(f"Couldn't reach '{path}' on the Hub ({type(e).name})")
ConnectionError: Couldn't reach 'moyix/debian_csrc' on the Hub (SSLError)
(deepspeed) geunsik-lim@ai02:~/qtlab$
(deepspeed) geunsik-lim@ai02:~/qtlab$
(deepspeed) geunsik-lim@ai02:~/qtlab$
(deepspeed) geunsik-lim@ai02:~/qtlab$ cat ./test_debian_csrc_dataset.py
!/usr/bin/env python
from datasets import load_dataset dataset = load_dataset("moyix/debian_csrc")
(deepspeed) geunsik-lim@ai02:~$ (deepspeed) geunsik-lim@ai02:~$ conda list -f pytorch
packages in environment at /home/geunsik-lim/anaconda3/envs/deepspeed:
#
Name Version Build Channel
pytorch 1.13.0 py3.10_cuda11.7_cudnn8.5.0_0 pytorch (deepspeed) geunsik-lim@ai02:~$ conda list -f python
packages in environment at /home/geunsik-lim/anaconda3/envs/deepspeed:
#
Name Version Build Channel
python 3.10.6 haa1d7c7_1 (deepspeed) geunsik-lim@ai02:~$ conda list -f datasets
packages in environment at /home/geunsik-lim/anaconda3/envs/deepspeed:
#
Name Version Build Channel
datasets 2.6.1 py_0 huggingface (deepspeed) geunsik-lim@ai02:~$ uname -a Linux ai02 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux (deepspeed) geunsik-lim@ai02:~$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"