horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
https://arxiv.org/abs/2305.11627
Apache License 2.0
880 stars 106 forks source link

ConnectionError: Couldn't reach https://raw.githubusercontent.com/wojzaremba/lstm/master/data/ptb.train.txt (ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=100)"))) #48

Closed qxpBlog closed 10 months ago

qxpBlog commented 10 months ago

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 667/667 [00:53<00:00, 12.35it/s] {'wikitext2': 20.046345644076645} /home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/load.py:1429: FutureWarning: The repository for ptb_text_only contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/ptb_text_only You can avoid this message in future by passing the argument trust_remote_code=True. Passing trust_remote_code=True will be mandatory to load this dataset from the next major release of datasets. warnings.warn( HF google storage unreachable. Downloading and preparing it from source 2024-01-14 05:26:19 - WARNING : HF google storage unreachable. Downloading and preparing it from source Traceback (most recent call last): File "/home/iotsc01/xinpengq/LLM-Pruner-main/hf_prune.py", line 314, in main(args) File "/home/iotsc01/xinpengq/LLM-Pruner-main/hf_prune.py", line 267, in main ppl = PPLMetric(model, tokenizer, ['wikitext2', 'ptb'], args.max_seq_len, device=args.evaldevice) File "/home/iotsc01/xinpengq/LLM-Pruner-main/LLMPruner/evaluator/ppl.py", line 10, in PPLMetric , test_loader = get_loaders(dataset, tokenizer, seq_len=seq_len, batch_size = batch_size) File "/home/iotsc01/xinpengq/LLM-Pruner-main/LLMPruner/datasets/ppl_dataset.py", line 50, in get_loaders train_data, test_data = get_ptb(seq_len, tokenizer) File "/home/iotsc01/xinpengq/LLM-Pruner-main/LLMPruner/datasets/ppl_dataset.py", line 19, in get_ptb traindata = load_dataset('ptb_text_only', 'penn_treebank', split='train') File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/load.py", line 2549, in load_dataset builder_instance.download_and_prepare( File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/builder.py", line 1005, in download_and_prepare self._download_and_prepare( File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/builder.py", line 1767, in _download_and_prepare super()._download_and_prepare( File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/builder.py", line 1078, in _download_and_prepare split_generators = self._split_generators(dl_manager, **split_generators_kwargs) File "/home/iotsc01/.cache/huggingface/modules/datasets_modules/datasets/ptb_text_only/8d1b97746fb9765d140e569ec5ddd35e20af4d37761f5e1bf357ea0b081f2c1f/ptb_text_only.py", line 131, in _split_generators data_dir = dl_manager.download_and_extract(my_urls) File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/download/download_manager.py", line 562, in download_and_extract return self.extract(self.download(url_or_urls)) File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/download/download_manager.py", line 426, in download downloaded_path_or_paths = map_nested( File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 466, in map_nested mapped = [ File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 467, in _single_map_nested((function, obj, types, None, True, None)) File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 370, in _single_map_nested return function(data_struct) File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/download/download_manager.py", line 451, in _download out = cached_path(url_or_filename, download_config=download_config) File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 188, in cached_path output_path = get_from_cache( File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/datasets/utils/file_utils.py", line 573, in get_from_cache raise ConnectionError(f"Couldn't reach {url} ({repr(head_error)})") ConnectionError: Couldn't reach https://raw.githubusercontent.com/wojzaremba/lstm/master/data/ptb.train.txt (ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=100)")))

qxpBlog commented 10 months ago

@VainF @eltociear @horseee 您好,我想请问一下为什么我的wikitext2数据集可以下载成功,但是ptb数据集就无法下载

xwang365 commented 3 months ago

@VainF @eltociear @horseee 您好,我想请问一下为什么我的wikitext2数据集可以下载成功,但是ptb数据集就无法下载

想问下这个问题后来怎么解决的,遇到同样问题