huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.16k stars 2.67k forks source link

Dataset Viewer issue for fgrezes/WIESP2022-NER #4477

Closed AshTayade closed 2 years ago

AshTayade commented 2 years ago

Link

No response

Description

No response

Owner

No response

severo commented 2 years ago

https://huggingface.co/datasets/fgrezes/WIESP2022-NER

The error:

Message:       Couldn't find a dataset script at /src/services/worker/fgrezes/WIESP2022-NER/WIESP2022-NER.py or any data file in the same directory. Couldn't find 'fgrezes/WIESP2022-NER' on the Hugging Face Hub either: FileNotFoundError: Unable to resolve any data file that matches ['**test*', '**eval*'] in dataset repository fgrezes/WIESP2022-NER with any supported extension ['csv', 'tsv', 'json', 'jsonl', 'parquet', 'txt', 'blp', 'bmp', 'dib', 'bufr', 'cur', 'pcx', 'dcx', 'dds', 'ps', 'eps', 'fit', 'fits', 'fli', 'flc', 'ftc', 'ftu', 'gbr', 'gif', 'grib', 'h5', 'hdf', 'png', 'apng', 'jp2', 'j2k', 'jpc', 'jpf', 'jpx', 'j2c', 'icns', 'ico', 'im', 'iim', 'tif', 'tiff', 'jfif', 'jpe', 'jpg', 'jpeg', 'mpg', 'mpeg', 'msp', 'pcd', 'pxr', 'pbm', 'pgm', 'ppm', 'pnm', 'psd', 'bw', 'rgb', 'rgba', 'sgi', 'ras', 'tga', 'icb', 'vda', 'vst', 'webp', 'wmf', 'emf', 'xbm', 'xpm', 'zip']

I understand the issue is not related to the dataset viewer in itself, but with the autodetection of the data files without a loading script in the datasets library. cc @lhoestq @albertvillanova @mariosasko

lhoestq commented 2 years ago

Apparently it finds scoring-scripts/compute_seqeval.py which matches **eval*, a regex that detects a test split. We should probably improve the regex because it's not supposed to catch this kind of files. It must also only check for files with supported extensions: txt, csv, png etc.