Closed mlhoggard closed 3 years ago
Mike, I think the reason is that VirSorter2 requires contigs to have at least two genes unless there are hallmark genes detected. RNA viruses are typically shorter and have polyproteins, and thus more likely to not pass the 2 gene minimal requirement. The "all" is just to be a short cut for all groups, not an options to include all input sequences to be in ouput.
Hi @jiarong,
Thanks for the quick reply. Ah ok, that makes sense then. In the case of polyproteins, would prodigal call this as a single gene or overlook it entirely? If the former, would there be any value in allowing an option to reduce the minimum gene count to one, or does VirSorter2 actually functionally require at least two rather than it simply being a preference for virus detection?
Thanks again, Mike.
Prodigal can call polyprotein fairly well in my experience, although I have also seen cases that predicted genes look off, usually too many short genes, likely due to non-canonical translation mechanisms. VirSorter2 relies on a few key genomic features that require at least two genes. For short contigs, the extra AMG related info from DRAMv.py is not meaningful/reliable. I would just run DRAM.py or other annotation tools.
Thanks @jiarong. Much appreciated for all the info (and all the work with VirSorter2 in general).
Hi there,
I'm looking to trial annotating putative viral contigs via DRAM-v and was wondering if there is a
--prep-for-dramv
setting whereby all filtering in VirSorter2 can be fully silenced (i.e. so that all input sequences are present in the output prep-for-dramv.fa
andaffi-contigs.tab
files)?I'm working with a few sets of putative viral contigs identified using multiple tools (including VirSorter2), and am looking to feed the full set back through VirSorter2 again, but this time only to generate the required files for annotation via DRAM-v. However, I've noticed that this second VirSorter2 step is still filtering some contigs out. With a set of DNA viruses, this is a small subset, but with a set of putative RNA viruses approximately half were filtered out.
I've attempted multiple variants of settings for the other parameters but with no luck, including:
virsorter run --seqname-suffix-off --viral-gene-enrich-off --provirus-off --prep-for-dramv --keep-original-seq --min-score 0 --min-length 0 --include-groups dsDNAphage,NCLDV,RNA,ssDNA,lavidaviridae ...
I'm unsure if it's the
--include-groups
step that's causing the remaining filtering? The default is currently only two groups, but I noticed in the work-in-progress updates that you are aiming to implement anall
option, so I was wondering if simply listing all the groups is still omitting any sequences that don't get assigned to any one of the groups (whereas I'm assuming theall
option is intended to keep all sequences regardless of whether they are assigned to a group?)?Kind regards, Mike.