Open kapsakcj opened 2 months ago
@emily-smith1 had success with the dev branch as it stands today: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/dataAnalysis_SARS-CoV-2_CA-CDC/job_history/6cd834cc-aa53-4d40-8a27-4554edcbae7b
I'm glad we didn't delete this branch! pats self on back 😄
:cool:
:pushpin: Explain the Request
Some GenBank accessions are unable to be downloaded via the command we currently have in the Assembly_fetch workflow:
In code here: https://github.com/theiagen/public_health_bioinformatics/blob/5be343354f716d77e9e4a0fb4a2ec10eb3bc00a5/tasks/utilities/data_import/task_ncbi_datasets.wdl#L27C5-L28C24
For example, with this accession,
OM900516.2
, it fails with this message:The reason being is that these kinds of accession are only accessible through the NCBI Virus data package, so you have to specify a different sub-command to download the genome (& other associated files)
This command works:
I've started a dev branch called
cjk-assembly-fetch
for this a long time ago but it was left by the wayside as other higher priorities arose.It would be good to continue making commits to this branch and add in support more completely. Things that need to be done:
datasets
CLI tool. StaPH-B has one that's a little more up-to-date though not likely the absolute latest versiondatasets download virus
subcommand are unimpacted by changes