Dataset conversion instructions

uni-medical / STU-Net

The largest pre-trained medical image segmentation model (1.4B parameters) based on the largest public dataset (>100k annotations), up until April 2023.

Apache License 2.0

264 stars 24 forks source link

Dataset conversion instructions #6

Closed zhi-xuan-chen closed 1 year ago

zhi-xuan-chen commented 1 year ago

I want to ask whether the fine-tune dataset you used in your work are just follow the preprocessing method provided by nnUNet_v1？Thank you very much if you can help me!

Ziyan-Huang commented 1 year ago

@zhi-xuan-chen Thank you for your interest in our work! Yes, for the fine-tuning dataset, we followed the preprocessing method provided by nnUNet_v1. If you have any further questions, please feel free to ask.

zhi-xuan-chen commented 1 year ago

OK, thank you very much!

zhi-xuan-chen commented 1 year ago

I am trying to fine-tune Amos2022 datasets with pretrained model you provide to reproduce you results. Now I have finished the "dataset conversion" period according to nnUNet-v1, then I need to run "nnUNet_plan_and_preprocess". I want to ask which script I need to run?

Ziyan-Huang commented 1 year ago

@zhi-xuan-chen After completing the dataset conversion according to nnUNet-v1, you indeed need to run the "nnUNet_plan_and_preprocess" step. You can execute the following script:

nnUNet_plan_and_preprocess -t <TASK_ID>

Replace with the task ID of your Amos2022 dataset.

After running the "nnUNet_plan_and_preprocess" script, follow the fine-tuning instructions in our README file.

zhi-xuan-chen commented 1 year ago

Thank you for your answer! I just find a problem. The dataset folder structure required by nnUNet-v1 only contains "imagesTr" and "labelsTr", but the current AMOS dataset https://zenodo.org/record/7155725#.Y0OOCOxBztM. contains extra "imagesVa" and "labelsVa". Their validation dataset are seperated from the training dataset, compared with the cross-validation dataset of nnUNet-v1. So, how can I do to ensure the validation datasets are the data in the "imagesVa" folder? Whether the dataset conversion process should make some difference?

Ziyan-Huang commented 1 year ago

@zhi-xuan-chen To address the AMOS dataset folder structure issue, here are two approaches you can consider:

Combine the "imagesVa" and "labelsVa" folders with the "imagesTr" and "labelsTr" folders. Then modify the "split_final" file to include the correct validation cases.
Maintain the original "imagesTr" and "labelsTr" folders. Set the training fold to "all" to train the model using all training data. After training, perform inference on the validation data using the "nnUNet_predict" script and calculate the performance metrics based on the validation ground truth.

Choose the method you find more convenient, and please let us know if you need any further help.

zhi-xuan-chen commented 1 year ago

Thank you for your detailed answer. They are really helpful! I think the second one is suitable for me. And I want to know how can I convert the Amos2022 to the required format of nnUNet-v1. It seems the nnUNet_convert_decathlon_task command cannot work for the non-MSD dataset, I got the error fo AssertionError: Input folder start with TaskXX with XX being a 3-digit id: 00, 01, 02 etc, but the required id which surpass 500 is always 3-digit.

Ziyan-Huang commented 1 year ago

You are correct that the nnUNet_convert_decathlon_task command is designed for the Medical Segmentation Decathlon (MSD) dataset and may not work directly with non-MSD datasets like AMOS 2022.

For converting the AMOS 2022 dataset, I recommend referring to the official nnUNet documentation and tutorials for guidance on how to preprocess and convert non-MSD datasets.

zhi-xuan-chen commented 1 year ago

Thank you very much! I have converted the dataset successfully.