Closed albertvillanova closed 2 months ago
Hello,
I have still the same issue when loading the dataset with the new version: https://huggingface.co/datasets/wikimedia/structured-wikipedia/discussions/5
I have downloaded and unzipped the wikimedia/structured-wikipedia dataset locally but when loading I have the same issue.
import datasets
dataset = datasets.load_dataset("/gpfsdsdir/dataset/HuggingFace/wikimedia/structured-wikipedia/20240916.fr")
TypeError: Couldn't cast array of type
struct<content_url: string, width: int64, height: int64, alternative_text: string>
to
{'content_url': Value(dtype='string', id=None), 'width': Value(dtype='int64', id=None), 'height': Value(dtype='int64', id=None)}
The above exception was the direct cause of the following exception:
My version of datasets is 3.0.1
JSON lines with missing struct fields raise TypeError: Couldn't cast array of type.
See example: https://huggingface.co/datasets/wikimedia/structured-wikipedia/discussions/5
One would expect that the struct missing fields are added with null values.