[ ] incorporate content into the datasets docs https://huggingface.co/docs/datasets/index. Might be in several places: in the intro, as a separate section (port/API), and as a side note in every relevant API reference function (ie. get_split_names -> the information can also be obtained from a REST API...)
Other ideas
[ ] prepare a list of announcements:
public API
why we build it or why we think it's useful: access metadata and data from the Hub datasets through a web API (ie without Python), help the community build exploration tools easily, increasing the offer of tools to connect to the Hub. Currently: allows getting the splits, the first rows, the columns, and their types (features)
feedback: what would you want to access through the API? Through GitHub: huggingface/datasets, label "API"? Other channels?
roadmap: the future features. Possibly:
random access to the rows,
statistics on the columns,
other preprocessed information (analysis, metrics) about the datasets, eg: detection of biases, distribution of the classes, size of the images, presence or not of faces in image datasets, distribution of images in a class, number of speakers in audio datasets, presence of noise, ie real-world not-clean audio files, in audio datasets.
recommendation system, to get a list of other datasets with the same format (same columns with the same type).
more on this here: https://docs.google.com/document/d/1XgJ8BuPZ2mM_VJ7K3DtwlU_AQSeSgSi8HD40XjE3b34/edit
Work with @huggingface/datasets and @osanseviero + avocado team to push the publication of the API
Tasks:
Other ideas
one specific message per channel (computer vision, audio, etc)(<- too much spam). Better use "announcements"discuss forum-> not for announcements. Might help for feedbackblogPublish the API on specialized sites:
ideas by @julien-c @lhoestq @albertvillanova @thomwolf @NimaBoscarino @merveenoyan