Issue
To upload metadata manifests to Synapse using schematic, a targetId must be provided, which corresponds to a folder where the manifest CSV is stored.
A database containing SynIDs for Synapse projects and folders for each metadata type was previously generated (here) to support the ingress process
Moving forward, targetId acquisition should be integrated into upload, so a separate database doesn't need to be maintained and to limit the risk of uploads to incorrect target folders.
Proposed solution
Query a Synapse table, to get target Synapse IDs associated with MC2 grant projects and a given type of metadata
suggested query: SELECT id,name,parentId FROM syn27210848 WHERE name='folder name' AND parentId IN (comma-separated list of single-quoted project Synapse IDs)
Return query as a data frame
associate targetIds with manifest paths (this is current input format for manifest upload script)
Changes to existing framework
Instead of an input CSV with manifest paths and targetIds as the primary input, I propose each input CSV row is composed of the following:
manifest paths
project Synapse IDs
Where folder name is provided at the command line (e.g., publications, datasets, tools, etc.)
Optionally, the input sheet can be generated programmatically, using the grant number contained in the filename of split manifests. This could be done as follows:
Issue To upload metadata manifests to Synapse using schematic, a targetId must be provided, which corresponds to a folder where the manifest CSV is stored.
A database containing SynIDs for Synapse projects and folders for each metadata type was previously generated (here) to support the ingress process
Moving forward, targetId acquisition should be integrated into upload, so a separate database doesn't need to be maintained and to limit the risk of uploads to incorrect target folders.
Proposed solution Query a Synapse table, to get target Synapse IDs associated with MC2 grant projects and a given type of metadata
Changes to existing framework Instead of an input CSV with manifest paths and targetIds as the primary input, I propose each input CSV row is composed of the following:
Where folder name is provided at the command line (e.g., publications, datasets, tools, etc.)
Optionally, the input sheet can be generated programmatically, using the grant number contained in the filename of split manifests. This could be done as follows: