Closed aditigopalan closed 1 month ago
This ticket monitors our curation workflow, and I'm considering creating a distinct ticket for each month's tracking. Feel free to make any adjustments if I missed anything!
2. Data model
@Bankso , please let me know if it's more practical to continue tracking 2 on a separate ticket, as you've been doing, or if it's acceptable to maintain the tracking here.
3. Contributor Engagement
@aclayton555 , I'm aware that we're ongoing with documentation and community curation, so I anticipate this evolving over time.
Uploads will also include ProjectView uploads (in progress). Will also cover other backlog ticket and bring the portal up level to March pubmed crawl.
In addition to our standard resource types and the new ProjectView info, can we please also include the Educational Resources in our coming portal sync (https://www.synapse.org/#!Synapse:syn52963530/tables/ -> https://www.synapse.org/#!Synapse:syn51497305/tables/)
@Bankso does the workflow that's currently running also update UNION tables?
does the workflow that's currently running also update UNION tables?
Nope, union table scopes can be updated with merge_tables.py
I validate the union tables with the union_qc.py script and then do manual inspection/editing to resolve any errors in the merged CSV. The corrected CSV is what gets uploaded to Synapse and pulled into the portal sync workflow.
"Performing automate portal sync to CCKP" is currently blocked by this issue. It is still under investigation by the Synapse platform team. I will run the sync as soon as the related issue is resolved.
Issue has been resolved. To summarize for our own records:
Some observations after running a couple of portal sync tests:
For best performance, we will now run a bash script locally instead of using GitHub Actions (which had some performance issues).
Syncing publications and tools went well, but ran into issues with datasets, in particular, DatasetView_id is expected to only contain synIDs (since the datasetId column of the portal table has a column type of "Entity"). @Bankso has noted that "we could continue to allow contributors to provide any unique identifier in the DatatsetView_id column and then replace it with a synId that we generate during the union table QC process".
After addressing the issues above ^^^, portal tables are now updated.
As part of the close out on this, @vpchung will push a sync of the Educational Resources (https://www.synapse.org/#!Synapse:syn52963530/tables/ -> https://www.synapse.org/#!Synapse:syn51497305/tables/)
Thanks Verena!
Done ✅ With 1 new resource added.
This ticket tracks curation workflow progression.
Note: It is possible for work to take place simultaneously in the three sections with overlapping periods, allowing curation workflows for different months to coincide.
1. Curation and Annotation
[x] Run Pubmed crawler to generate PublicationView manifest [205 publications generated, long sprint anticipated]
[x] Send Amber and Jineta a copy of the PublicationView manifest from latest crawl to review for MC2 Center Newsletter publication highlights
[x] Send Amber "News from CCKP" for MC2 Center Newsletter
[x] Annotate publications in PublicationView manifest [In progress]
[x] Generate ToolView and DatasetView manifests based on PublicationView manifest
[x] Run the automated curation workflow to upload publications, datasets and tools [This includes splitting manifests, processing and validating manifests, generating target synapse IDs for upload, schema updates, upload to synapse and (in progress) a validation check for uploads)
[x] Generate UNION tables
[x] QC of staging tables
[x] Performing automate portal sync to CCKP
[x] Validate data on the CCKP
Status check [Plan to report numbers for each category following pubmed crawl]:
[x] Publication upload [83]
[x] Tool upload [0]
[x] Data set upload [20]
2. Data model
3. Contributor Engagement