Open MikeDoes opened 8 months ago
Hi! You can avoid the error by requesting only the jsonl
files. dataset = load_dataset("ai4privacy/pii-masking-200k", data_files=["*.jsonl"])
.
Our data file inference does not filter out (incompatible) json
files because json
and jsonl
use the same builder. Still, I think the inference should differentiate these extensions because it's safe to assume that loading them together will lead to an error. WDYT @lhoestq?
Raising an error if there is a mix of json and jsonl in the builder makes sense yea
Describe the bug
Dear Datasets team,
We just have published a dataset on Huggingface: https://huggingface.co/ai4privacy
However, when trying to read it using the Dataset library we get an error. As I understand jsonl files are compatible, could you please clarify how we can solve the issue? Please let me know and we would be more than happy to adapt the structure of the repository or meta data so it works easier:
Thank you and have a great day ahead
Steps to reproduce the bug
Open Google Colab Notebook:
Run command: !pip3 install datasets
Run code: from datasets import load_dataset dataset = load_dataset("ai4privacy/pii-masking-200k")
Expected behavior
Download the dataset successfully from HuggingFace to the notebook so that we can start working with it
Environment info
datasets
version: 2.14.6