Open aclayton555 opened 5 months ago
Keep this as a stretch for 24-2 sprint. Will be helpful to bring to upcoming team discussions on RFC/data model strategy.
Current plan is to update the lucid charts to include 1) the current implementation of the MC2 data model, 2) the next planned features/modification to the data model (once decided)
Concerning the data decision tree, it will likely be a while before we can 100% fill this out, since I think it makes sense to pilot data flows before releasing a decision tree for others to reference, but my current plan is to store the information for the flow chart in a CSV and build the diagrams using Graphviz and pydot
I've started collecting information and ideas about our data model in this deck Since I'm using them to think/brainstorm, those slides are not in any useable form right now, but I wanted to make sure they were linked, since I plan for them to be helpful down the road.
Continue work on this in 24-3. Opportunity to pair this down in scope to maximize utility. Tidy up view of normalized vs denormalized table since guidance from FAIR on this has evolved.
In 24-3 sprint, really prioritize the draft decision tree component of this. Also consider CFDE in data routing ecosystem. Aim to have internal review and feedback on decision tree within this sprint.
Summary of progress
Mostly done:
Current metadata flow from curation to CCKP (top left of this page): https://lucid.app/lucidchart/5c8160b1-f087-4fbd-a679-1e6d175e4a69/edit?invitationId=inv_ad91d3a1-968e-4414-ba33-f537e1b2b179&page=uL47LgRFF_7p#
Current generalized data storage, packaging, and release flow (bottom of this page): https://lucid.app/lucidchart/5c8160b1-f087-4fbd-a679-1e6d175e4a69/edit?invitationId=inv_ad91d3a1-968e-4414-ba33-f537e1b2b179&page=uL47LgRFF_7p#
Contributor focused version of the generalized data flow (middle right of this page): https://lucid.app/lucidchart/5c8160b1-f087-4fbd-a679-1e6d175e4a69/edit?invitationId=inv_ad91d3a1-968e-4414-ba33-f537e1b2b179&page=uL47LgRFF_7p#
In progress:
data repository/storage/access decision trees (general decision tree will likely cover most situations, but needs to be updated to indicate expanded routing support, Also needs to be prettified): https://lucid.app/lucidchart/5c8160b1-f087-4fbd-a679-1e6d175e4a69/edit?invitationId=inv_ad91d3a1-968e-4414-ba33-f537e1b2b179&page=tpgY4wj5MoBQ#
data models (still needs the most work. Will include the data sharing plan, linkage updates, assay-specific models, etc.): https://lucid.app/lucidchart/5c8160b1-f087-4fbd-a679-1e6d175e4a69/edit?invitationId=inv_ad91d3a1-968e-4414-ba33-f537e1b2b179&page=5UHjkC3iyF0H#
Continue work through 24-5 sprint, including feedback from team. Consider different audiences for aspects of this (decision tree). Maybe bring this for discussion at an internal team meeting in late May.
Bring for discussion on June 12 MC2 Center Team meeting for feedback from the group.
Presented on June 12 MC2 Center Team meeting. Specific asks for the team:
I will include a reminder to the team to take a look at these this week!
Back in the day, when our data model was much "simpler," we maintained visualization of the data model in Lucidchart:
https://lucid.app/lucidchart/5c8160b1-f087-4fbd-a679-1e6d175e4a69/edit?invitationId=inv_ad91d3a1-968e-4414-ba33-f537e1b2b179&page=lpgYV5TpYyu8#
These diagrams were critical in helping us design and transition our data model to a schematic-based data model (this occurred at the end of the CSBC/PS-ON grant in 2022).
With updates expected to occur on a monthly cadence in MC2, do we still want to maintain these? Are there alternative solutions that we can adopt that are more automated (e.g. updates occur with our data model releases)? If we do want to maintain these, need to establish an owner and process for keeping these up to date.
FAIR Data has been doing some data model visualization work, but this hasn't been consistent supported (was beta tested in HTAN, but updates were not automated)
Consider this in the context of WHO might want to browse a visual representation of the data model. Might compliment efforts in https://github.com/mc2-center/data-models/issues/49