Closed friesac closed 2 years ago
One issue I ran into was that the consensus output fasta from mad_river has too long of sequence names. So, we'll have to trim the extraneous information that comes out of iVar.
TIL that ivar consensus
accepts the -i
flag, which sets a name for that line. (The -i
flag isn't documented in the manual entry for ivar consensus
, and I only stumbled upon this when I was browsing the source code.) This will also obviate the need for a certain part of the performance_lineage_excel.py
script that matches these unnecessarily long sample names with those from the Illumina sample sheet.
TIL that ivar consensus accepts the -I flag
Good find!
I forgot to mention above but we need to make sure fasta-trim-terminal-ambigs.pl
is used first in the container to create a trimmed fasta prior to v-annotate.pl
we can use the --minlen 50 --maxlen 30000
arguments described at vadr wiki. I think this is consistent with Genbank.
It would be most ideal to have a new output directory with the results of VADR, specifically v-annotate.pl described at https://github.com/ncbi/vadr/wiki/Coronavirus-annotation#howto.
I ran the latest staphb container successfully from: https://hub.docker.com/r/staphb/vadr
I ran the below commands referenced at the #howto on my macbook pro with 2 cpus on about 500 sequences rather quickly.
v-annotate.pl --split --cpu 8 --glsearch -s -r --nomisc --mkey sarscov2 --lowsim5seq 6 --lowsim3seq 6 --alt_fail lowscore,insertnn,deletinn --mdir <sarscov2-models-dir-path> <fasta-file-to-annotate> <output-directory-to-create>
One issue I ran into was that the consensus output fasta from mad_river has too long of sequence names. So, we'll have to trim the extraneous information that comes out of iVar.