Strategy for defining connections between different types of metadata

Bankso commented 10 months ago

As the MC2 data model becomes more complex and has more pieces, it will likely be important to define connections between different types of metadata/manifests. The solution to this type of work previously has been to use Syn IDs, which indicate which manifests files are connected.

Since MC2 uses schematic and has primary keys defined for upsert, all metadata for a given resource, assay type and level, etc. will be stored in a single manifest that gets updated over time. Using Syn IDs as references to different metadata manifests is not compatible with a single manifest that gets updated.

One potential solution is to use the '_id' primary key (integrated into all data models) to reference specific rows in separate manifests. This is something that we do in some contexts already, but I think it would be good to formalize the strategy and ensure we think about generalized implementations. This could be integrated into 1) resource-focused metadata templates or 2) we could define study or experimental models that help define all the connections. A high-level "study config" could also help centralize information about experiments that applies to multiple processing levels (similar, in principle, to the DatasetView manifest)

Bankso commented 9 months ago

Relevant to #71

Bankso commented 5 months ago

Addressed in Lucid charts as part of #54 and integrated into PR #107

aclayton555 commented 4 months ago

Expect this work to be within the scope of work for #115

mc2-center / data-models

Strategy for defining connections between different types of metadata #56