kirbyju / tcia_utils

A package to simplify common tasks one might perform when interacting with The Cancer Imaging Archive (TCIA) via Jupyter/Python.
Apache License 2.0
14 stars 4 forks source link

Implement getSimpleSearchWithModalityAndBodyPartPaged #4

Closed zjp closed 1 year ago

zjp commented 1 year ago

Closes #3

I'm open to feedback!

kirbyju commented 1 year ago

Thanks for submitting this! A couple of minor details.

1) Can you provide a sample query to demonstrate usage on this in the comments? I think that might help folks since this is a little more complicated than most of the other functions. I am probably doing something silly, but I can't figure out how to avoid errors if I wanted to select more than one value for a given criteria. E.g if I wanted to query for patients with CR or DX modalities in a single function call, what does that look like?

2) I've been trying to organize the code into sections based on the API they use or the functionality they provide. Can you put this down with the rest of the Advanced API functions? They're near the end.

zjp commented 1 year ago

Good question! I couldn't see from the documentation page how to get more than one value in. I went to the SimpleSearch webpage, opened the network inspector, and clicked on more than one modality to see what showed up. I discovered those modalities are sent as different parameters, e.g.:


criteriaType0: 
ImageModalityCriteria
value0: 
MG
criteriaType1: 
ImageModalityCriteria
value1: 
CR
criteriaType2: 
ImageModalityCriteria
value2: 
CT```

So this requires a small refactor. I'll push a new patch soon. 
kirbyju commented 1 year ago

Cool, thanks. One other detail is that the dataframe and csv format options are smashing the majority of the output into the second column. It might be nice to break that out into more columns so it's easier to read.

zjp commented 1 year ago

OK, working on the DF and CSV options, but the refactor to support multiple arguments is done I think. I also added a docstring and put the function down with the rest of the Advanced API functions. Also added format checking for fromDate and toDate, and sensible defaults for them.

zjp commented 1 year ago

I'm actually not sure how to parse this cleanly. You probably have more experience with DataFrames/CSV than I do. Does anything stick out at you for how to massage this data? Sample return pretty-printed below.

[
   {
      "criteria":"Collections",
      "values":[
         {
            "criteria":"4D-Lung",
            "count":"20"
         }
      ]
   },
   {
      "criteria":"Species",
      "values":[
         {
            "criteria":"337915000",
            "count":"20"
         }
      ]
   },
   {
      "criteria":"Image Modality",
      "values":[
         {
            "criteria":"CT",
            "count":"20"
         },
         {
            "criteria":"RTSTRUCT",
            "count":"20"
         }
      ]
   },
   {
      "criteria":"Anatomical Site",
      "values":[
         {
            "criteria":"LUNG",
            "count":"20"
         }
      ]
   },
   {
      "criteria":"Manufacturer",
      "values":[
         {
            "criteria":"ADAC",
            "count":"20"
         },
         {
            "criteria":"Varian Imaging Laboratories, Switzerland",
            "count":"20"
         }
      ]
   }
]
kirbyju commented 1 year ago

Yeah, it's a bit tricky. I'm wondering if DF/CSV even make any sense as outputs on this particular function. Maybe we should just remove "format" as a parameter and ditch the related lines to convert the JSON? Did you have a specific use case for this function? What do you plan to do with the return values from this function after you run it?

zjp commented 1 year ago

That may be for the best. For my purposes, I'm only interested in gathering what's listed in the "criteria" fields for further processing later on.

zjp commented 1 year ago

That last push did not remove formatting, but fixed an oversight: species arguments are NPEX Concept IDs, not strings. The function needed to translate between user facing arguments ('human', 'mouse', 'dog') and the IDs.

zjp commented 1 year ago

I've updated the title of this PR and will work to switch the code over to that endpoint too.

zjp commented 1 year ago

OK, the latest push implements getSimpleSearchWithModalityAndBodyPartPaged

kirbyju commented 1 year ago

Something's still not right here. If I use your example query in the GUI I am getting different numbers than this API call is returning:

nbia.getSimpleSearchWithModalityAndBodyPartPaged(collections=["TCGA-UCEC", "4D-Lung"], modalities=["CT"])

'totalPatients': 61,
 'bodyParts': [{'value': 'LUNG', 'count': 20},
  {'value': 'ABDOMEN', 'count': 1},
  {'value': 'UTERUS', 'count': 38},
  {'value': 'NOT SPECIFIED', 'count': 2}],
 'modalities': [{'value': 'CT', 'count': 61}],
 'collections': [{'value': '4D-Lung', 'count': 20},
  {'value': 'TCGA-UCEC', 'count': 41}],
 'species': [{'value': '337915000', 'count': 61}],

image

zjp commented 1 year ago

I think I've narrowed it down. Can you re-run your query with minStudies=0 and see if that gives you the correct result? It appears to give me correct results. If so I will push a change to make the default 0 instead of 1.

kirbyju commented 1 year ago

Works on my end too!

zjp commented 1 year ago

Great to hear! I just pushed the updated code that sets minStudies to 0 by default.