huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
18.71k stars 2.58k forks source link

Add MedImg for streaming #6912

Open lhallee opened 1 month ago

lhallee commented 1 month ago

Feature request

Host the MedImg dataset (similar to Imagenet but for biomedical images).

Motivation

There is a clear need for biomedical image foundation models and large scale biomedical datasets that are easily streamable. This would be an excellent tool for the biomedical community.

Your contribution

MedImg can be found here.

lhallee commented 1 month ago

@mariosasko, @lhoestq, @albertvillanova Hello! Can anyone help? or can you guys suggest who can help with this?

lhoestq commented 1 month ago

Hi ! Feel free to download the dataset and create a Dataset object with it.

Then your'll be able to use push_to_hub() to upload the dataset to HF in Parquet format and make it streamable :)

lhallee commented 4 weeks ago

Hi ! Feel free to download the dataset and create a Dataset object with it.

Then your'll be able to use push_to_hub() to upload the dataset to HF in Parquet format and make it streamable :)

The dataset is several TB in total, which I do not have the resources to handle.