huggingface / dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
https://huggingface.co/docs/dataset-viewer
Apache License 2.0
689 stars 76 forks source link

Show the proportion of image/audio formats in stats? #2806

Open severo opened 4 months ago

severo commented 4 months ago

Proposal here

I would love to see these features on the dataset viewer:

  1. Image datasets: Show the extensions and the number for each extension.
mohammad-albarham commented 4 months ago

Thanks for the support @severo.

So, my suggestion is as follows (images or audios or any thing with extensions):

So, I can see the total number of images and the number of each extension for the datasets.

severo commented 3 months ago

We now have the count for every extension in dataset-filetypes. It's not published in the API though.

mohammad-albarham commented 3 months ago

Great @severo

Sorry, I am not familiar with this term dataset-filetypes. What is dataset-filetypes? Where can I see the feature now, please ?

So, when it published on the API, it will be shown on the Huggingface Datasets ?

severo commented 3 months ago

dataset-filetypes is a new "step," i.e., a pre-processed computation. It computes the number of files for each file extension in the main branch. However, not all "steps" are published in the HTTP API. I haven't created an API endpoint to consume the result yet.