luyug / COIL

NAACL2021 - COIL Contextualized Lexical Retriever
Apache License 2.0
142 stars 27 forks source link

Dataset error when encoding document #5

Open ylwangy opened 3 years ago

ylwangy commented 3 years ago

I find some of the encoding output-dir is empty because the error happens, while the others are normal and filled with cls&token file.

Traceback (most recent call last): File "run_marco.py", line 303, in main() File "run_marco.py", line 217, in main data_args.encode_in_path, tokenizer, p_max_len=data_args.p_max_len File "/nfs/users/wangyile/coil/marco_datasets.py", line 126, in init data_files=path_to_json, File "/nfs/users/wangyile/anaconda3/envs/coil/lib/python3.7/site-packages/datasets/load.py", line 589, in load_dataset path, script_version=script_version, download_config=download_config, download_mode=download_mode, dataset=True File "/nfs/users/wangyile/anaconda3/envs/coil/lib/python3.7/site-packages/datasets/load.py", line 267, in prepare_module local_path = cached_path(file_path, download_config=download_config) File "/nfs/users/wangyile/anaconda3/envs/coil/lib/python3.7/site-packages/datasets/utils/file_utils.py", line 308, in cached_path use_etag=download_config.use_etag, File "/nfs/users/wangyile/anaconda3/envs/coil/lib/python3.7/site-packages/datasets/utils/file_utils.py", line 487, in get_from_cache raise ConnectionError("Couldn't reach {}".format(url)) ConnectionError: Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/1.1.3/datasets/json/json.py

luyug commented 3 years ago

The error message seems to suggest that the datasets package fails to access the json processing script. There is not enough information on why this happens. I suspect that something's wrong with your environment.