Open aclayton555 opened 2 months ago
24-9: @aditya-nath-sage will pick this up and chat with @jaybee84 about aligning approaches with NF.
Target output for this sprint: design doc for how to implement datasets across MC2 and NF (much of this captured in linked ticket above), including tentative annotation process. (will this leverage schematic or the Synapse API?)
Additional info on Synapse Datasets: https://help.synapse.org/docs/Datasets.2611281979.html
Goal is to create a design document for how MC2 and NF will want to handle this issue. Example design doc: https://docs.google.com/document/d/1dF1-FjGSdO3nkKArEsrnjnWFLeOV78MlvGZvM8smJVk/edit?pli=1#heading=h.47emx3tcx2wj
24-9 Close-Out: Currently working with DM group (at Sage) to create a org-wide Dataset schema. This may take some time to reach consensus, but we can prioritize incorporating a placeholder model that we can then add the finalized schema later. Check on this mid sprint in 24-10.
@aditya-nath-sage let's touch base on this during our check-in tomorrow!
Aditya and Orion to meet to align on this. In the meantime, @aditya-nath-sage to review ongoing design doc
Establish end of year goal for this effort
24-10: Orion has a rough script on how to bind entities in Synapse. Need to understand how the schematic outputs will work here and what the schema looks like.
24-11/12 Scope: Start working on this. This about how we surface datasets and collections that are on Synapse, and how these connect to publications via queryable metadata. Good to take stock of how many Datasets exist currently. Goal for end of sprint is a prelim design doc.
Another thought: for the record based datasets we have, how can we maybe generate and surface a collection of related datasets. SOme limitations here, as Synapse Collections currently only consolidate Dataset entities. One possibility is to generate entities from records, and create a Dataset from these, then create a Collection.
Rough draft of a schema bind script: https://github.com/mc2-center/mc2-center-dcc/blob/add-utils-11-24/utils/synapse_json_schema_bind.py
Rough draft of a script to convert Synapse table info to annotations: https://github.com/mc2-center/mc2-center-dcc/blob/add-utils-11-24/utils/table_to_annotations.py
Script for creating a Synapse Dataset and adding entities from a folder: https://github.com/mc2-center/mc2-center-dcc/blob/add-utils-11-24/utils/build_datasets.py
Emerges from exploratory and feasibility analysis in: https://github.com/mc2-center/mc2-center-dcc/issues/71
This ticket should track efforts to develop and implement a schema for annotating Synapse Datasets curated as part of the proposed MC2 Center workflow (Note that this is different from the existing 'Datasets' component in the MC2 Center data model). This is the first of several steps, which may be tracked in separate tickets as this work progress:
1) Define the schema (there have been ongoing discussions on this among the data managers group at Sage, but no resolution) 2) implementation as the JSON (this will be applicable and complimentary to ongoing efforts in NF) 3) exploration and incorporation of automation (again, pull in efforts from NF) 4) longer term: how this will look on the portal and what the expected user experience will be.