mc2-center / data-models

Versioned history of the MC2 Center data model
https://mc2-center.github.io/data-models/
Creative Commons Zero v1.0 Universal
1 stars 1 forks source link

[March] Curation Workflow Tracking #84

Closed aditigopalan closed 1 month ago

aditigopalan commented 3 months ago

This ticket tracks curation workflow progression.

Note: It is possible for work to take place simultaneously in the three sections with overlapping periods, allowing curation workflows for different months to coincide.

1. Curation and Annotation

2. Data model

3. Contributor Engagement

aditigopalan commented 3 months ago

This ticket monitors our curation workflow, and I'm considering creating a distinct ticket for each month's tracking. Feel free to make any adjustments if I missed anything!

2. Data model

@Bankso , please let me know if it's more practical to continue tracking 2 on a separate ticket, as you've been doing, or if it's acceptable to maintain the tracking here.

3. Contributor Engagement

@aclayton555 , I'm aware that we're ongoing with documentation and community curation, so I anticipate this evolving over time.

aclayton555 commented 2 months ago

Uploads will also include ProjectView uploads (in progress). Will also cover other backlog ticket and bring the portal up level to March pubmed crawl.

aclayton555 commented 1 month ago

In addition to our standard resource types and the new ProjectView info, can we please also include the Educational Resources in our coming portal sync (https://www.synapse.org/#!Synapse:syn52963530/tables/ -> https://www.synapse.org/#!Synapse:syn51497305/tables/)

aditigopalan commented 1 month ago

@Bankso does the workflow that's currently running also update UNION tables?

Bankso commented 1 month ago

does the workflow that's currently running also update UNION tables?

Nope, union table scopes can be updated with merge_tables.py

I validate the union tables with the union_qc.py script and then do manual inspection/editing to resolve any errors in the merged CSV. The corrected CSV is what gets uploaded to Synapse and pulled into the portal sync workflow.

vpchung commented 1 month ago

5/14/2024 update

"Performing automate portal sync to CCKP" is currently blocked by this issue. It is still under investigation by the Synapse platform team. I will run the sync as soon as the related issue is resolved.

vpchung commented 1 month ago

5/17/2024 update

Issue has been resolved. To summarize for our own records:

Some observations after running a couple of portal sync tests:

  • For best performance, we will now run a bash script locally instead of using GitHub Actions (which had some performance issues).

  • Syncing publications and tools went well, but ran into issues with datasets, in particular, DatasetView_id is expected to only contain synIDs (since the datasetId column of the portal table has a column type of "Entity"). @Bankso has noted that "we could continue to allow contributors to provide any unique identifier in the DatatsetView_id column and then replace it with a synId that we generate during the union table QC process".

After addressing the issues above ^^^, portal tables are now updated.

aclayton555 commented 1 month ago

As part of the close out on this, @vpchung will push a sync of the Educational Resources (https://www.synapse.org/#!Synapse:syn52963530/tables/ -> https://www.synapse.org/#!Synapse:syn51497305/tables/)

Thanks Verena!

vpchung commented 1 month ago

Done ✅ With 1 new resource added.