rcuocolo / PROSTATEx_masks

Lesion and prostate masks for the PROSTATEx training dataset, after a lesion-by-lesion quality check.
https://rcuocolo.github.io/PROSTATEx_masks/
Creative Commons Attribution 4.0 International
73 stars 17 forks source link

Request for radiomic feature extraction #13

Closed DrraBL closed 3 years ago

DrraBL commented 3 years ago

Dear @rcuocolo First of all thank you for your wonderful and great work. at the beginning i did not understand what are doing via this work but then i found it like a treasure. I am new in medical field and i was confused between the medical imaging extension and also for their true mask segmentation which we need them to train any deep learning model. specially for the prostate X dataset it's little bit weird for me because i could not understand the files which it contains. now i can say that your perfect give me the prostate masks which i need them to radiomic extraction, for that i am contacting you because i have some request:

  1. I want to extract the radiomic features from T2 MRI, so i need to use the original slices and it segmentation label, which mask i need to use the prostate or the lesion? and it's possible to convert the masks to nrrrd file?
  2. then I want to classify the gleason grade group, can you advice me how i can use the masks to do that?
  3. otherwise, i am sorry to ask you alot, can you explain me more about prostatex dataset and how i can use your work for radiamic features and then use the radiomics features to classify the gleason?

looking forward to hearing from you Kind regards,

rcuocolo commented 3 years ago

Hello, thank you for your interest in the repository. Regarding your questions:

  1. As stated in the readme file, currently I would suggest converting the PROSTATEx DICOM files into NIFTI and using the NIFTI masks. This is the process we actually followed and I then transposed the NIFTI-coordinate masks to the original DICOM space. As can be seen in the other open issue, this gave some problems with the DWI/ADC masks as the array appears to be still partially "flipped". Using dcm2niix (using the patient name, sequence name and number options) the output files should be identical to those we used and there would be no geometry issues. NIFTI and NRRD should be convertible from one format to another (for example using 3D Slicer, which reads both) but I do not see a particularly good reason to prefer one over the other.
  2. The class labels are available both in terms of clinically significant (GGG 2 or higher) lesions or not on the whole dataset as well as specific GGG labels for the PROSTATEx2 dataset. You can simply pair the csv files to the extracted radiomic database using patient names and lesion number as the indices. I usually employ the pandas package for similar tasks. Whether you choose to use the entire PROSTATEx dataset or only PROSTATEx2, I would suggest keeping the GGG 2 or higher threshold to dichotomize the classes. There is some debate on classifying GGG 2 lesions as "intermediate" compared to 3 or higher, but I would not use this approach without good domain knowledge in the team (this choice has to be appropriately supported by the study aim and you should be ready to address this point during a hypothetical peer review process).
  3. Among currently available software for radiomic data extraction, I prefer PyRadiomics. It is fairly simple to use. You can find details of the parameters I usually employ in prostate MRI described in literature (e.g., doi: 10.1007/s00330-021-07856-3, open access, contains the parameters file in the supplementary materials). For the analysis in and of itself, there is a wide range of tools, from R (e.g., caret), Python (e.g., scikit-learn) tools or GUI-based software (e.g., WEKA, KNIME, RapidMiner). I usually employ scikit-learn or WEKA with some R (mainly for statistics or metrics).
rcuocolo commented 3 years ago

I would also suggest reading the pinned Issue #5 as it contains some insights on the dataset that could be useful for you.