As the MC2 data model becomes more complex and has more pieces, it will likely be important to define connections between different types of metadata/manifests. The solution to this type of work previously has been to use Syn IDs, which indicate which manifests files are connected.
Since MC2 uses schematic and has primary keys defined for upsert, all metadata for a given resource, assay type and level, etc. will be stored in a single manifest that gets updated over time. Using Syn IDs as references to different metadata manifests is not compatible with a single manifest that gets updated.
One potential solution is to use the '_id' primary key (integrated into all data models) to reference specific rows in separate manifests. This is something that we do in some contexts already, but I think it would be good to formalize the strategy and ensure we think about generalized implementations. This could be integrated into 1) resource-focused metadata templates or 2) we could define study or experimental models that help define all the connections. A high-level "study config" could also help centralize information about experiments that applies to multiple processing levels (similar, in principle, to the DatasetView manifest)
As the MC2 data model becomes more complex and has more pieces, it will likely be important to define connections between different types of metadata/manifests. The solution to this type of work previously has been to use Syn IDs, which indicate which manifests files are connected.
Since MC2 uses schematic and has primary keys defined for upsert, all metadata for a given resource, assay type and level, etc. will be stored in a single manifest that gets updated over time. Using Syn IDs as references to different metadata manifests is not compatible with a single manifest that gets updated.
One potential solution is to use the '_id' primary key (integrated into all data models) to reference specific rows in separate manifests. This is something that we do in some contexts already, but I think it would be good to formalize the strategy and ensure we think about generalized implementations. This could be integrated into 1) resource-focused metadata templates or 2) we could define study or experimental models that help define all the connections. A high-level "study config" could also help centralize information about experiments that applies to multiple processing levels (similar, in principle, to the DatasetView manifest)