monarch-initiative / oncoexporter

Cancer data to GA4GH phenopacket
https://monarch-initiative.github.io/oncoexporter
MIT License
6 stars 1 forks source link

specimen: derived_from_subject #46

Closed pnrobinson closed 6 months ago

pnrobinson commented 8 months ago

The CDA specimen "derived_from_subject" corresponds to the GA4GH Biosample "individual_id field.

This should be added to the CdaBiosampleFactory

pnrobinson commented 8 months ago

Also, do we need to use the CDA field researchsubject_id or subject id?

msierk commented 8 months ago

Also, do we need to use the CDA field researchsubject_id or subject id?

We need the subject id for sure. The research subject_id refers to a study, so we might not need it for the pilot but will need it eventually.

pnrobinson commented 7 months ago

Sometimes, but not always, the specimen_id field is not the same as the derived_from_specimen field. Is this a secondary specimen (e.g., DNA specimen derived from a tissue specimen)?

pnrobinson commented 7 months ago

This is what we have now

 derived_from_subj = row['derived_from_subject']
      if derived_from_subj is not None:
          biosample.individual_id = derived_from_subj

      # derived_from_specimen -> derived_from_id
      derived_from = row['derived_from_specimen']
      if derived_from is not None:
          if derived_from == 'initial specimen':
              biosample.derived_from_id = derived_from_subj
          else:
              biosample.derived_from_id = derived_from

@msierk -- does this look correct?

msierk commented 7 months ago

Yes it is to designate a secondary specimen. Under mapping specimen it says (for GDC) 'specimen_type' is "'sample' or 'portion' or 'slide' or 'analyte' or 'aliquot'" and 'derived_from_specimen' is "'initial specimen' if specimen_type is 'sample'; otherwise Specimen.id for parent Specimen record". The biosample.derived_from_id can be empty.

So here's what I think it should look like:

# derived_from_specimen -> derived_from_id 
derived_from = row['derived_from_specimen']    
if derived_from is not None:  
  if derived_from != 'initial specimen':  
    biosample.derived_from_id = derived_from

I don't know if we should include a check that CDA specimen_type is 'sample' if derived_from is 'initial specimen'? @pnrobinson

ielis commented 6 months ago

Hi @msierk @pnrobinson can you please check if this is this still being worked on?

The status indicates Done but the text suggests an ongoing debate.

I would like to close issues that have been completed to clean up the project.

Thank you! 🙃

msierk commented 6 months ago

@ielis I just pushed this change, it's fine to close it. I put in a comment about possibly adding a check for specimen_type = sample in the future.

ielis commented 6 months ago

Super, thank you!