huggingface / dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
https://huggingface.co/docs/dataset-viewer
Apache License 2.0
698 stars 77 forks source link

Remove extra `label` column #3014

Open severo opened 3 months ago

severo commented 3 months ago

In example dataset https://huggingface.co/datasets/datasets-examples/doc-audio-4, we have an "unexpected" label column with only null values.

Capture d’écran 2024-08-02 à 12 33 10

I think it's due to a "collision" between the heuristics that define splits and/or classes based on the directories. There is a drop_labels=True option in the datasets library, if it helps.

Ideally, in this case, we should have two splits (train and test), and no additional label column.

I think the issue also exists with image datasets.