huggingface / dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
https://huggingface.co/docs/dataset-viewer
Apache License 2.0
692 stars 77 forks source link

Modalities not detected for some datasets using the Webdatasets format #2996

Open ProGamerGov opened 2 months ago

ProGamerGov commented 2 months ago

I have found 2 examples of the modality detection code failing to recognize modalities in text and image datasets using the Webdataset format:

I'm not sure where in the modality detection code that things are failing: https://github.com/huggingface/dataset-viewer/blob/main/services/worker/src/worker/job_runners/dataset/modalities.py

severo commented 2 months ago

Thanks for opening. Note that you can force the modality: https://huggingface.co/docs/hub/datasets-cards#force-set-a-dataset-modality

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.