mc2-center / data-models

Versioned history of the MC2 Center data model
https://mc2-center.github.io/data-models/
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

Data model content updates to support GH docs #67

Open Bankso opened 4 months ago

Bankso commented 4 months ago

Relative to https://github.com/mc2-center/data-models/issues/49 and https://github.com/mc2-center/data-models/pull/66

Draft of the MC2 data model dictionary, using GitHub pages deployment, is here: https://mc2-center.github.io/data-models/

Potential actions that could improve documentation quality (should determine necessity/priority for the following):

aclayton555 commented 3 months ago

in 24-3, at least first two bullets are readily do-able. Third bullet might be more difficult so need to scope this further and see how far we can get.

Bankso commented 2 months ago

Currently reviewing components, attributes, and valid values

For hierarchy/structure, I did some preliminary analysis with GPT and ontology scoping, documented here: https://docs.google.com/document/d/1Vs-X4laTfih2YpoouF0njCCSQmcC4AIpsmgmCdl5b9c/edit?usp=sharing

Summary: it seems doable, but it will be a lot of work. To help minimize effort required, I'll source from existing ontologies for structure and devise mappings when needed.

In terms of implementation, I think defining pair-wise relationships will be sufficient, since the information will be carried forward in each mapping. A generic example would be:

Take five terms: RNA-seq, scRNA-seq, ATAC-seq, scATAC-seq, WGS Highest level group: Genomic technique Possible second level groups: bulk, single-cell, transcriptomics, epigenomics, RNA, DNA (lots of options, is the point)

Organizing terms would occur in a CSV, using the column names: Technique (should replace assay), Parent, [all other info captured]

Then relationships are easy to define and structure is easily inferred, using Genomic --> bulk, single-cell --> RNA-seq, scRNA-seq, ATAC-seq, scATAC-seq, WGS

Technique, Parent Genomic, None Bulk, Genomic Single-cell, Genomic RNA-seq, Bulk ATAC-seq, Bulk WGS, Bulk scRNA-seq, Single-cell scATAC-seq, Single-cell . . .

aclayton555 commented 2 months ago

Suggest to chat with ANV to see how this was designed and implemented in NF

aclayton555 commented 1 month ago