huggingface / hub-docs

Docs of the Hugging Face Hub
http://hf.co/docs/hub
Apache License 2.0
268 stars 229 forks source link

Expand "Large scale datasets" + add "Frequently updated datasets" #1349

Open severo opened 1 month ago

severo commented 1 month ago

The section https://huggingface.co/docs/hub/datasets-adding#large-scale-datasets is somewhat small.

I think we could add content copied from https://huggingface.co/docs/hub/repositories-recommendation, https://huggingface.co/docs/huggingface_hub/guides/upload#tips-and-tricks-for-large-uploads and https://github.com/huggingface/datasets/pull/6269

Also, have a section for "Frequently updated datasets"

severo commented 1 month ago

also: for large scale image datasets: ensure we promote the webdataset format. cc https://huggingface.slack.com/archives/C02V51Q3800/p1721744641131019?thread_ts=1718951495.849879&cid=C02V51Q3800