Open bstee615 opened 1 year ago
Hello, I get this error when I try to load your POJ-104 dataset from huggingface.
NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=18878686, num_examples=32000, dataset_name='code_x_glue_cc_clone_detection_poj104'), 'recorded': SplitInfo(name='train', num_bytes=20179075, num_examples=32500, dataset_name='code_x_glue_cc_clone_detection_poj104')}, {'expected': SplitInfo(name='validation', num_bytes=5765303, num_examples=8000, dataset_name='code_x_glue_cc_clone_detection_poj104'), 'recorded': SplitInfo(name='validation', num_bytes=6382433, num_examples=8500, dataset_name='code_x_glue_cc_clone_detection_poj104')}]
As far as I can tell, the dataset expects to load 500 fewer examples than the downloaded files contain. I attached a notebook which reproduces the issue:
Could you fix the issue so that we can load the dataset without ignore_verifications=True?
ignore_verifications=True
Hi, our datasets in huggingface are not maintained by us. It's recommended to follow our instructions for each task.
Ok, thanks for the info. Can you refer me to who maintains the huggingface datasets?
Hello, I get this error when I try to load your POJ-104 dataset from huggingface.
As far as I can tell, the dataset expects to load 500 fewer examples than the downloaded files contain. I attached a notebook which reproduces the issue:
Could you fix the issue so that we can load the dataset without
ignore_verifications=True
?