xplip / pixel

Research code for pixel-based encoders of language (PIXEL)
https://arxiv.org/abs/2207.06991
Apache License 2.0
329 stars 32 forks source link

Couldn't find plip/wiki_dev as the validation dataset #16

Open yiwang454 opened 10 months ago

yiwang454 commented 10 months ago

Hi @xplip , thanks for sharing your code. I'm currently running the pre-training scripts, and have met the issues with finding validation dataset. I got errors like "Repository Not Found for url: https://huggingface.co/api/datasets/plip/wiki_dev', which probably suggests that the "validation_dataset_name": "plip/wiki_dev" which was specified in their config file specified an evaluation dataset that is not on the huggingface.

The detailed error msg is as below:

Traceback (most recent call last): File "/exports/eddie/scratch/s2522559/pixel_project/pixel/modify_running_script.py", line 145, in main() File "/exports/eddie/scratch/s2522559/pixel_project/pixel/modify_running_script.py", line 142, in main trainer() File "/exports/eddie/scratch/s2522559/pixel_project/pixel/modify_running_script.py", line 106, in call trainer.main(self.config_dict) File "/exports/eddie/scratch/s2522559/pixel_project/pixel/scripts/training/run_pretraining.py", line 325, in main validation_dataset = load_dataset( File "/exports/eddie/scratch/s2522559/conda/envs/pixel/lib/python3.9/site-packages/datasets/load.py", line 1676, in load_dataset builder_instance = load_dataset_builder( File "/exports/eddie/scratch/s2522559/conda/envs/pixel/lib/python3.9/site-packages/datasets/load.py", line 1502, in load_dataset_builder dataset_module = dataset_module_factory( File "/exports/eddie/scratch/s2522559/conda/envs/pixel/lib/python3.9/site-packages/datasets/load.py", line 1254, in dataset_module_factory raise e1 from None File "/exports/eddie/scratch/s2522559/conda/envs/pixel/lib/python3.9/site-packages/datasets/load.py", line 1225, in dataset_module_factory raise e File "/exports/eddie/scratch/s2522559/conda/envs/pixel/lib/python3.9/site-packages/datasets/load.py", line 1205, in dataset_module_factory dataset_info = hf_api.dataset_info( File "/exports/eddie/scratch/s2522559/conda/envs/pixel/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) File "/exports/eddie/scratch/s2522559/conda/envs/pixel/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 1761, in dataset_info hf_raise_for_status(r) File "/exports/eddie/scratch/s2522559/conda/envs/pixel/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py", line 293, in hf_raise_for_status raise RepositoryNotFoundError(message, response) from e huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-654baa8e-2186f9fe22cc891357596294;926d6ac1-fd23-457c-b19d-8707baa23362)

Repository Not Found for url: https://huggingface.co/api/datasets/plip/wiki_dev. Please make sure you specified the correct repo_id and repo_type. If you are trying to access a private or gated repo, make sure you are authenticated. Invalid username or password.

Do you have any idea how to get access to the development set?

Stardust-y commented 6 months ago

same error, have you solved it yet?