Closed kapsakcj closed 2 years ago
Planning to restructure so that the call
for the VADR task appears in the block where organism is set to sarscov2 or mpxv.
Please do not review for now
I plan on closing this PR and opening another one with the branch cjk-vadr-consolidation
which will include the changes from this PR (dev branch started from this dev branch cjk-vadr-mpx
) and additionally consolidate to using the same task call block in all workflows, and thus only 4 VADR workflow-level outputs instead of having separate ones for sars-cov-2 and mpxv.
Setting as a draft for now, until testing in Terra is done and #171 is merged. The changes from PR 171 are included in this PR as well.
The design strategy was to alter the VADR task as little as possible and allow the user to define VADR input params to control between SARS-CoV-2 usage and MPXV. We will provide input JSONs to avoid typos.
This PR:
task_ncbi.wdl
changes:cpu
input param, default2
--split --cpu ~{cpu}
to VADRv-annotate.pl
command to speed things up a little bitTheiaCov_{ClearLabs, ONT, illumina_pe, illumina_se, fasta}
:MPXV
VADR_Update workflow does not require changes, all that needs adjusted is the input parameters
Input parameters that must be specified:
vadr_mpxv.maxlen=210000
vadr_mpxv.skip_length=30000
◀️ We can adjust higher to be more strict on when/when not to run VADR. 10000 is default for SARS-CoV-2, I just came up with 30000 with little reasoningvadr_mpxv="--glsearch -s -r --nomisc --mkey mpxv --r_lowsimok --r_lowsimxd 100 --r_lowsimxl 2000 --alt_pass discontn,dupregin"
- This input is VERY important and may change in the future if NCBI/Eric decides to alter the suggested MPXV input paramsI'm expecting CI to fail, as there are a good number of changes introduced here. Will address later after we're happy w the code