Open Bankso opened 9 months ago
in 24-3, at least first two bullets are readily do-able. Third bullet might be more difficult so need to scope this further and see how far we can get.
Currently reviewing components, attributes, and valid values
For hierarchy/structure, I did some preliminary analysis with GPT and ontology scoping, documented here: https://docs.google.com/document/d/1Vs-X4laTfih2YpoouF0njCCSQmcC4AIpsmgmCdl5b9c/edit?usp=sharing
Summary: it seems doable, but it will be a lot of work. To help minimize effort required, I'll source from existing ontologies for structure and devise mappings when needed.
In terms of implementation, I think defining pair-wise relationships will be sufficient, since the information will be carried forward in each mapping. A generic example would be:
Take five terms: RNA-seq, scRNA-seq, ATAC-seq, scATAC-seq, WGS Highest level group: Genomic technique Possible second level groups: bulk, single-cell, transcriptomics, epigenomics, RNA, DNA (lots of options, is the point)
Organizing terms would occur in a CSV, using the column names: Technique (should replace assay), Parent, [all other info captured]
Then relationships are easy to define and structure is easily inferred, using Genomic --> bulk, single-cell --> RNA-seq, scRNA-seq, ATAC-seq, scATAC-seq, WGS
Technique, Parent Genomic, None Bulk, Genomic Single-cell, Genomic RNA-seq, Bulk ATAC-seq, Bulk WGS, Bulk scRNA-seq, Single-cell scATAC-seq, Single-cell . . .
Suggest to chat with ANV to see how this was designed and implemented in NF
24-6: No updates this sprint. Carry into next sprint
I will continue to collate valid value definitions here for assays, tissues, and tumor types here: https://docs.google.com/spreadsheets/d/1YL8kDB_tdvGDYqDy4x8zlBauLPDxc24W4tLEArJh0kQ/edit?usp=sharing
In addition, there are many valid value sets that are missing descriptions/definitions, like file formats, licenses, input/output formats, etc. Next step here is identify all value types that would benefit from this exercise and note them here.
24-7/8 close out: have new models add (per #115 )
24-9: Secondary to site visit priorities.
Will require some work to add new components. Might be some room for automation to help pull this information easier as the data model updates
Combine this work with: https://github.com/mc2-center/data-models/issues/142
24-11/12 Blocked until tech writer contract in place.
Relative to https://github.com/mc2-center/data-models/issues/49 and https://github.com/mc2-center/data-models/pull/66
Draft of the MC2 data model dictionary, using GitHub pages deployment, is here: https://mc2-center.github.io/data-models/
Potential actions that could improve documentation quality (should determine necessity/priority for the following):
Component
andAttribute
entries and add descriptions where missing/incompleteValid Values
and add descriptions, ontology referencesValid Values