huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.61k stars 26.42k forks source link

(Load dataset failure) ConnectionError: Couldn’t reach https://raw.githubusercontent.com/huggingface/datasets/1.1.2/datasets/cnn_dailymail/cnn_dailymail.py #8027

Closed AI678 closed 3 years ago

AI678 commented 3 years ago

❓ Questions & Help

Details

Hey, I want to load the cnn-dailymail dataset for fine-tune. I write the code like this from datasets import load_dataset

test_dataset = load_dataset(“cnn_dailymail”, “3.0.0”, split=“train”)

And I got the following errors.

Traceback (most recent call last): File “test.py”, line 7, in test_dataset = load_dataset(“cnn_dailymail”, “3.0.0”, split=“test”) File “C:\Users\666666\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\load.py”, line 589, in load_dataset module_path, hash = prepare_module( File “C:\Users\666666\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\load.py”, line 268, in prepare_module local_path = cached_path(file_path, download_config=download_config) File “C:\Users\666666\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\utils\file_utils.py”, line 300, in cached_path output_path = get_from_cache( File “C:\Users\666666\AppData\Local\Programs\Python\Python38\lib\site-packages\datasets\utils\file_utils.py”, line 475, in get_from_cache raise ConnectionError(“Couldn’t reach {}”.format(url)) ConnectionError: Couldn’t reach https://raw.githubusercontent.com/huggingface/datasets/1.1.2/datasets/cnn_dailymail/cnn_dailymail.py

How can I fix this ?

A link to original question on the forum/Stack Overflow:

jemmryx commented 3 years ago

I got a same problem as you. I install the transformer from source but still has this problem. @sgugger

zhaofuchen commented 3 years ago

I had the same problem, but this is how I solved it 1、Access the address and download the file: https://raw.githubusercontent.com/huggingface/datasets/1.6.0/datasets/bookcorpus/bookcorpus.py 2、Put the file in this position: image 3、Then you can load the corpus normally. I hope it can help you