singaltanmay / snakemake-catalog-parser

Allows the Tool Registry Service (TRS) to ingest the Snakemake Workflow Catalog
0 stars 0 forks source link

Synchronize SCP with SWC and TRS data #4

Open singaltanmay opened 1 year ago

singaltanmay commented 1 year ago

Workflows in the TRS-Filer db are synced with what's on the Snakemake Workflow Catalog, i.e., only new workflows are added, updated workflows are updated, and those that are not available anymore are removed, while unchanged workflows remain untouched. This needs some thinking as to when an update is required and how this is tracked (basically an update should only happen if the TRS-Filer data would change, e.g., if there's a new version)

singaltanmay commented 1 year ago

As for deleting repos that went 404 should probably be an optional feature. That should also work via repo URLs.

Updates are the most tricky part. I guess you could restrict yourself to checking tags via the GitHub API. But still it would need some thinking. For example, you could take a snapshot of the metadata for the head/latest commit on the default branch for repos for which there are no tags and use the date/timestamp of crawling (perhaps appended by the commit hash for both better sortability and reproducibility) as the TRS version. And then set an option to auto-renew or skip these repos whenever you are crawling again.

And for repos that do have tags, you could use the tag names as versions and register one TRS version for every tag. Then, when crawling again, check if there are new tags and add them. You could also (perhaps optionally) add the head commit as a version as well, just like for the untagged/unversioned repos.