theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
34 stars 16 forks source link

Adding flu and RSV to TheiaCoV fastq workflows (VADR) #295

Closed kevinlibuit closed 3 months ago

kevinlibuit commented 6 months ago

:cool:

:pushpin: Explain the Request

Update the TheiaCoV fastq workflows (e.g. Illumina PE/SE, ONT, etc) to add VADR to influenza and RSV organism tracks, similar to TheiaCoV_FASTA

:books: Context

Will help laboratories perform QC of additional viral assemblies generated

:chart_with_upwards_trend: Desired Behavior

VADR outputs produced for more viral pathogens

kevinlibuit commented 6 months ago

Note: Flu will need to be added to all TheiaCoV workflows

cimendes commented 4 months ago

Flu and RSV models are available in the latest version of VADR (https://github.com/ncbi/vadr/wiki/Available-VADR-model-files) Docker container available from staph-b at us-docker.pkg.dev/general-theiagen/staphb/vadr:1.6.3

Commands to use:

FLU -r --atgonly --xnocomp --nomisc --alt_fail extrant5,extrant3 --mkey flu

RSV -r -xnocomp -mkey rsv

⚠️ WARNING: RSV Model uses over 30 GB of memory according to https://github.com/StaPH-B/docker-builds/blob/master/vadr/1.6.3/Dockerfile#L176-L177

We should be cautious when proceeding with RSV. Do we need to create two separate models for A and B to decrease the resource requirements?

cimendes commented 4 months ago

Solution: Pass the VADR memory requirement as input through the organism params sub-workflow