Closed jameshadfield closed 11 months ago
See shared GDoc for additional context and details on scripts.
Checklist of scripts that need to be added for me to keep track of progress:
Identical scripts (added in #6)
Diverged scripts with various different versions used across workflows (binned into related groups):
Simple notify scripts (added in #8)
S3 interaction + notify scripts that depend on S3 files (Jover's WIP branch)
Genbank interactions
Nextclade joining
Potential augur curate scripts
@joverlee521 thanks for making the checklist in the comment above! It'll be useful to have it continually updated. To make that easier, I've moved it to the main issue text.
In talking through #20 with @j23414, we realized that join-metadata-and-clades
can mostly be replaced with a couple of csvtk
commands (csvtk cut | csvtk rename2 | csvtk join
).
The version of the script in ncov-ingest adds clock_deviation, but that can be done separately from the joining.
Closing issue as we have resolved all of the listed duplicate scripts. Any other additions can be opened as separate issues in the future.
The first step in making this repository useful is to populate it with scripts that are currently manually copied around pathogen repos.
See shared GDoc for additional context and details on scripts.
Progress
This was originally created by @joverlee521 in https://github.com/nextstrain/ingest/issues/1#issuecomment-1636328472.
Identical scripts (added in #6)
Diverged scripts with various different versions used across workflows (binned into related groups):
Simple notify scripts (added in #8)
S3 interaction + notify scripts that depend on S3 files (added in #12)
Genbank interactions
Nextclade joining
Potential augur curate scripts
Summary of differences
This is the original issue text from @jameshadfield.
Here's a quick scan of duplicated ingest scripts, using monkeypox as the "base", against 4 other ingest script directories:
Directories of scripts considered: