ncihtan / data-models

Schema.org Data Models for HTAN
MIT License
14 stars 7 forks source link

Geomx fixes based on internal testing. #281

Closed PozhidayevaDarya closed 1 year ago

PozhidayevaDarya commented 1 year ago

Information

Changes based on suggestions from Vesteinn & Clarisse following testing:

  1. GeoMx DSP ROI Segment Annotation Metadata - does not come up in staging. Need to fix dependson component.
  2. Change "GeoMx DSP ROI Segment Annotation Metadata" to "NanoString GeoMx DSP ROI Segment Annotation Metadata"
  3. Remove "GeoMx DSP Assay Type" from Geomx Level 1
  4. L3 GeoMX DSP ROI Segment Annotation attrib typo in dependson.

Additional modifications: Split "NanoString GeoMx DSP ROI Segment Annotation Metadata" into DCC/RCC-specific templates

PR: https://github.com/ncihtan/data-models/pull/283

clarisse-lau commented 1 year ago

Sorry, just noticed something that I didn't catch in my first round of testing...

The GeoMx DSP ROI Segment Annotation Metadata template is essentially meant to be the machine-output Segment Summary file (example from OHSU attached), with minimal adjustments, right? We informed centers that it is just the file as is, but with HTAN Parent Biospecimen ID added. However, currently many Segment Summary file column names would need to be modified to match the template values and pass validation.
e.g. ROI X Coordinate --> GeoMx DSP ROI X Coordinate

@PozhidayevaDarya if this is expected behavior, please feel free to ignore- happy to leave as is :)

Otherwise, maybe a possible suggestion could be to only include fields in our validation template that match exactly to the Segment Summary file fields, and allow all others to be 'custom' additions?

Segment Summary(8)(78).xlsx

PozhidayevaDarya commented 1 year ago

@clarisse-lau Thanks for highlighting this! Yeah this was expected on my end. I kind of wanted to prioritize things being clear in the data model and I feel like it wouldn't be big deal for them to just rename the column before submitting. I'm happy to change it though if folks disagree!

clarisse-lau commented 1 year ago

Ok! Happy to leave however you think best My only concern is that it is not too clear how the template should be used and which column names should be changed (and to what) e.g. is template attribute Segment Tag equivalent to Segment Name or Tags in the summary file?

There are also a number of non-required fields in the template that don't exactly align (name or case-wise) with the summary file. If they're not required, then the center isn't obligated to change names to align with the template. So the source fields may ultimately just be entered 'as is' as custom fields, bypassing validation and use of these attributes in the data model. HTAN Template Segment Summary file
ROI QC Passed QC Status
GeoMx DSP Slide Scan Height Scan Height
GeoMx DSP Slide Scan Width Scan Width
GeoMx DSP RCC or DCC Filename RCC File Name
Positive Norm Factor Positive norm factor

We should also clarify the instructions that were provided to OHSU (submission of this template will in fact require more than just addition of the biospecimen column) - though maybe this can wait until the PR has been merged and the template is actually in DCA :)

PozhidayevaDarya commented 1 year ago

That all makes sense and honestly I will adjust it to match exactly to prevent issues. Thanks @clarisse-lau :)