oncoray / mirp

Medical Image Radiomics Processor
https://oncoray.github.io/mirp/
European Union Public License 1.2
38 stars 11 forks source link

TCIA data usage policy compliance and suggestions for TCIA integration #83

Open kirbyju opened 2 months ago

kirbyju commented 2 months ago

Hi, apologies for ignoring the issue template but I don't think my inquiry fits the mold. I have 2 things I'd like to raise with you all.

  1. It appears you're utilizing the Soft Tissue Sarcoma dataset from TCIA in your tutorial documentation. In order to comply with TCIA's Data Usage Policy you need to list the data citation in the tutorial as follows to provide attribution to the folks who published this data:

Vallières, Martin, Freeman, Carolyn R., Skamene, Sonia R., & El Naqa, Issam. (2015). A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities (Soft-tissue-Sarcoma) [Dataset]. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2015.7GO2GSKS

  1. Rather than storing a copy of the data you want to use for demonstration purposes on Github, it could be very useful if you were to show people how to grab data directly from TCIA using our APIs. I've created many tutorials for working with our APIs, but the REST API Download notebook is probably most relevant for this. I'd be more than happy to answer any questions or to work with you on building out documentation for mirp to simplify users' ability to apply your tool to our datasets.

Just FYI, I stumbled on to this repo because I was interested in running some of our existing TCIA segmentation data through a standardized radiomics pipeline. It seems silly to have all our users repeating this computational process when we could do it once and provide the results such that people can dive right into using the derived features. I am extra excited that this tool adheres to the IBSI guidelines. Please let me know if you'd be interested to discuss potential collaborations.

alexzwanenburg commented 2 months ago

Thanks for opening this issue.

I have created the following tasks:

alexzwanenburg commented 2 months ago

Just FYI, I stumbled on to this repo because I was interested in running some of our existing TCIA segmentation data through a standardized radiomics pipeline.

There is an open issue with processing some DICOM SEG files, which I will be looking into: #81 .

kirbyju commented 2 months ago

Thanks for the quick reply! I just remembered that https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_Segmentations.ipynb would also be very useful for you to review since it's wholly focused on getting segmentation data from TCIA. It also includes info about how you can (usually) find the related image series that was used to create a given segmentation (RTSTRUCT or SEG).

kirbyju commented 2 months ago

I took a stab at updating your tutorial to use tcia_utils to grab the data: https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_MIRP.ipynb. Let me know if you have any questions about how it works or suggestions for improvement. I'd love to extend the notebook a bit to cover a few of our other datasets and then advertise this to our user community, but I am clueless about what image pre-processing steps might be required for different datasets. I'm especially interested in showing how to do this with the RTSTRUCT segmentation data that's available for our CPTAC datasets that are discussed in https://github.com/kirbyju/TCIA_Notebooks/blob/main/CPTAC/CPTAC.ipynb.

alexzwanenburg commented 2 months ago

Looks good! For the MIRP longform documentation, I am looking to create a new tutorial, since I want to keep the current tutorial relatively simple and straightforward, and focus mainly on MIRP. Can I use some of your code in that tutorial?

Also, I am currently not looking to add too many bells and whistles to the visualisation tool. The tool works builds upon the internal image representation: show method. Internally an InteractivePlot is created, which interacts with a matplotlib canvas. The hardest part behind show is preparing the image itself by converting DICOM (and other image formats) to the internal representation. Adding additional DICOM modalities is ongoing work.

kirbyju commented 2 months ago

Sure, feel free to use the code wherever you like.

Are there any specific resources (documentation? publications?) you could point me to that would help me determine which MIRP options I may need to utilize with extract_features() based on the results of extract_image_parameters()? I'm trying to decide whether I can realistically tackle that if I write some tutorial notebooks for specific TCIA datasets or if I should just put a big disclaimer saying that users should keep in mind that they may need to apply such parameters in order to get scientifically valid results.

Also, is there any possibility you'd consider updating MIRP to a license that does not require derivative works to use the same license? E.g. Apache or BSD? I'm just wondering if the current license might prevent MIRP from being offered as an extension in popular tools such as 3D Slicer.

alexzwanenburg commented 2 months ago

Are there any specific resources (documentation? publications?) you could point me to that would help me determine which MIRP options I may need to utilize with extract_features() based on the results of extract_image_parameters()? I'm trying to decide whether I can realistically tackle that if I write some tutorial notebooks for specific TCIA datasets or if I should just put a big disclaimer saying that users should keep in mind that they may need to apply such parameters in order to get scientifically valid results.

I am afraid there is no general guidance on how image procession should be configured. It mostly involves some domain knowledge. As a rough guide, you can ask yourself the following questions:

Also, is there any possibility you'd consider updating MIRP to a license that does not require derivative works to use the same license? E.g. Apache or BSD? I'm just wondering if the current license might prevent MIRP from being offered as an extension in popular tools such as 3D Slicer.

I am not at liberty to change the license. However, it shouldn't be problematic as long as these tools simply provide an interface with MIRP, including as an extension. The EUPL license does indeed carry over to derivative works (though these works can be relicensed under selected compatible licenses). However, unlike strong copyleft licences such as GPL3 the EUPL does not prevent linking the software by other products under different licenses. The following guidance is provided:

The EUPL refers to the laws of EU countries and is therefore interoperable. This means that all the interfaces of the covered software (the APIs, formats, data structures) can be freely copied and reproduced in other independent works in order to build interoperability, e.g. combining software distributed under the EUPL with any other software licensed differently, even under a proprietary licence. In such a combination or statically linked aggregation, every linked component will keep its primary licence, without any ‘viral effect’.

kirbyju commented 2 months ago

Thanks for the additional info. I don't think I saw this in the documentation, but is there a way to tell MIRP you only want to run a particular class of features (e.g. only do morphological) or only specific individual features?

alexzwanenburg commented 2 months ago

You can select specific classes of features using base_feature_families, or in case of filtered images (response maps) using response_map_feature_families.

Computing individual features is currently not supported, and would require a rewrite of the feature computation part of the code.

kirbyju commented 1 month ago

Just FYI, I stumbled on to this repo because I was interested in running some of our existing TCIA segmentation data through a standardized radiomics pipeline.

There is an open issue with processing some DICOM SEG files, which I will be looking into: #81 .

This is a bit of a tangent, but the report of this user having some potential trouble with TCIA SEG data got me wondering how you handle RTSTRUCT "keyhole" issues similar to what was reported in https://github.com/SlicerRt/SlicerRT/issues/171 and https://github.com/pyplati/platipy/issues/244.

alexzwanenburg commented 1 month ago

We encountered similar issues, but were able to resolve the issue. Actual conversion of the contour data to segmentation masks takes place in the convert_contour_to_mask method.

MIRP does the following:

I hope this may provide some insight into how this works.

having some potential trouble with TCIA SEG data

I wasn't able to reproduce the SEG issues. In fact, SEG and RTSTRUCT produced the exact same segmentation masks.