huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
18.97k stars 2.62k forks source link

Add The Medical Segmentation Decathlon Dataset #3583

Open omarespejel opened 2 years ago

omarespejel commented 2 years ago

Adding a Dataset

(cc @osanseviero @abidlabs )

Instructions to add a new dataset can be found here.

pri1311 commented 2 years ago

Hello! I have recently been involved with a medical image segmentation project myself and was going through the The Medical Segmentation Decathlon Dataset as well. I haven't yet had experience adding datasets to this repository yet but would love to get started. Should I take this issue? If yes, I've got two questions -

  1. There are 10 different datasets available, so are all datasets to be added in a single PR, or one at a time?
  2. Since it's a competition, masks for the test-set are not available. How is that to be tackled? Sorry if it's a silly question, I have recently started exploring datasets.
mariosasko commented 2 years ago

Hi! Sure, feel free to take this issue. You can self-assign the issue by commenting #self-assign.

To answer your questions:

  1. It makes the most sense to add each one as a separate config, so one dataset script with 10 configs in a single PR.
  2. Just set masks in the test set to None.

Note that the images/masks in this dataset are in NIfTI format, which our Image feature currently doesn't support, so I think it's best to yield the paths to the images/masks in the script and add a preprocessing section to the card where we explain how to load/process the images/masks with nibabel (I can help with that).

pri1311 commented 2 years ago

Note that the images/masks in this dataset are in NIfTI format, which our Image feature currently doesn't support, so I think it's best to yield the paths to the images/masks in the script and add a preprocessing section to the card where we explain how to load/process the images/masks with nibabel (I can help with that).

Gotcha, thanks. Will start working on the issue and let you know in case of any doubt.

pri1311 commented 2 years ago

self-assign

osanseviero commented 2 years ago

This is great! There is a first model on the HUb that uses this dataset! https://huggingface.co/MONAI/example_spleen_segmentation