mc2-center / data-models

Versioned history of the MC2 Center data model
https://mc2-center.github.io/data-models/
Creative Commons Zero v1.0 Universal
3 stars 1 forks source link

[Synapse Datasets + Collections] Define and incorporate schema for Synapse Datasets #136

Open aclayton555 opened 2 months ago

aclayton555 commented 2 months ago

Emerges from exploratory and feasibility analysis in: https://github.com/mc2-center/mc2-center-dcc/issues/71

This ticket should track efforts to develop and implement a schema for annotating Synapse Datasets curated as part of the proposed MC2 Center workflow (Note that this is different from the existing 'Datasets' component in the MC2 Center data model). This is the first of several steps, which may be tracked in separate tickets as this work progress:

1) Define the schema (there have been ongoing discussions on this among the data managers group at Sage, but no resolution) 2) implementation as the JSON (this will be applicable and complimentary to ongoing efforts in NF) 3) exploration and incorporation of automation (again, pull in efforts from NF) 4) longer term: how this will look on the portal and what the expected user experience will be.

aclayton555 commented 2 months ago

24-9: @aditya-nath-sage will pick this up and chat with @jaybee84 about aligning approaches with NF.

Target output for this sprint: design doc for how to implement datasets across MC2 and NF (much of this captured in linked ticket above), including tentative annotation process. (will this leverage schematic or the Synapse API?)

Additional info on Synapse Datasets: https://help.synapse.org/docs/Datasets.2611281979.html

aditya-nath-sage commented 2 months ago

Goal is to create a design document for how MC2 and NF will want to handle this issue. Example design doc: https://docs.google.com/document/d/1dF1-FjGSdO3nkKArEsrnjnWFLeOV78MlvGZvM8smJVk/edit?pli=1#heading=h.47emx3tcx2wj

aclayton555 commented 1 month ago

24-9 Close-Out: Currently working with DM group (at Sage) to create a org-wide Dataset schema. This may take some time to reach consensus, but we can prioritize incorporating a placeholder model that we can then add the finalized schema later. Check on this mid sprint in 24-10.

aclayton555 commented 1 month ago

@aditya-nath-sage let's touch base on this during our check-in tomorrow!

aclayton555 commented 1 month ago

Aditya and Orion to meet to align on this. In the meantime, @aditya-nath-sage to review ongoing design doc

Establish end of year goal for this effort

aclayton555 commented 3 weeks ago

24-10: Orion has a rough script on how to bind entities in Synapse. Need to understand how the schematic outputs will work here and what the schema looks like.

aclayton555 commented 3 weeks ago

24-11/12 Scope: Start working on this. This about how we surface datasets and collections that are on Synapse, and how these connect to publications via queryable metadata. Good to take stock of how many Datasets exist currently. Goal for end of sprint is a prelim design doc.

Another thought: for the record based datasets we have, how can we maybe generate and surface a collection of related datasets. SOme limitations here, as Synapse Collections currently only consolidate Dataset entities. One possibility is to generate entities from records, and create a Dataset from these, then create a Collection.

Bankso commented 2 weeks ago

Rough draft of a schema bind script: https://github.com/mc2-center/mc2-center-dcc/blob/add-utils-11-24/utils/synapse_json_schema_bind.py

Rough draft of a script to convert Synapse table info to annotations: https://github.com/mc2-center/mc2-center-dcc/blob/add-utils-11-24/utils/table_to_annotations.py

Script for creating a Synapse Dataset and adding entities from a folder: https://github.com/mc2-center/mc2-center-dcc/blob/add-utils-11-24/utils/build_datasets.py