Closed pnrobinson closed 6 months ago
Also, do we need to use the CDA field researchsubject_id or subject id?
Also, do we need to use the CDA field researchsubject_id or subject id?
We need the subject id for sure. The research subject_id refers to a study, so we might not need it for the pilot but will need it eventually.
Sometimes, but not always, the specimen_id field is not the same as the derived_from_specimen field. Is this a secondary specimen (e.g., DNA specimen derived from a tissue specimen)?
This is what we have now
derived_from_subj = row['derived_from_subject']
if derived_from_subj is not None:
biosample.individual_id = derived_from_subj
# derived_from_specimen -> derived_from_id
derived_from = row['derived_from_specimen']
if derived_from is not None:
if derived_from == 'initial specimen':
biosample.derived_from_id = derived_from_subj
else:
biosample.derived_from_id = derived_from
@msierk -- does this look correct?
Yes it is to designate a secondary specimen. Under mapping specimen it says (for GDC) 'specimen_type' is "'sample' or 'portion' or 'slide' or 'analyte' or 'aliquot'" and 'derived_from_specimen' is "'initial specimen' if specimen_type is 'sample'; otherwise Specimen.id for parent Specimen record". The biosample.derived_from_id can be empty.
So here's what I think it should look like:
# derived_from_specimen -> derived_from_id
derived_from = row['derived_from_specimen']
if derived_from is not None:
if derived_from != 'initial specimen':
biosample.derived_from_id = derived_from
I don't know if we should include a check that CDA specimen_type is 'sample' if derived_from is 'initial specimen'? @pnrobinson
Hi @msierk @pnrobinson can you please check if this is this still being worked on?
The status indicates Done but the text suggests an ongoing debate.
I would like to close issues that have been completed to clean up the project.
Thank you! 🙃
@ielis I just pushed this change, it's fine to close it. I put in a comment about possibly adding a check for specimen_type = sample in the future.
Super, thank you!
The CDA specimen "derived_from_subject" corresponds to the GA4GH Biosample "individual_id field.
This should be added to the CdaBiosampleFactory