jiarong / VirSorter2

customizable pipeline to identify viral sequences from (meta)genomic data
GNU General Public License v2.0
225 stars 31 forks source link

error with an RNA group #168

Closed trilisser closed 1 year ago

trilisser commented 1 year ago

I analyze multiple samples from Illumina (rna viruses). Some of them were passed without problem, but some not. The error in log is here: extract-feature-from-hmmout-common.log The command was:

/home/*/virsorter run -w ./RNA_59_rna/vs_out -i ./RNA_59_rna/soft_filtered_transcripts.fasta --min-length 500 --include-groups RNA -j 1 all
jiarong commented 1 year ago

Hi, thanks for reporting the issue. Can you attach ./RNA_59_rna/vs_out/iter-0/all.pdg.Viruses.hmmtbl?

trilisser commented 1 year ago

Hello! Thanks for the reply. The file: all.pdg.Viruses.zip

jiarong commented 1 year ago

Hi, it looks like the file is corrupted, which is likely caused by different jobs writing to the file at the same time. I would recommend that when you running multiple samples, make sure the output directory are different among different samples. For this one sample, you can delete the output directory and rerun the same command.

trilisser commented 1 year ago

The outputs seems to be imposed on each other due to my bash launcher preliminary tests. I have removed old directories then reran vs2 and now it works! Thank you so much!!!

jiarong commented 1 year ago

No problem! BTW, VS2 is designed to differentiate viruses from hosts, but not optimized to differentiate RNA virus from other virus. Also I noticed you used a very low length cutoff (500bp) that tends to have false positives in results. For getting RNA virus only, I would recommend only screen for those with hallmark gene (RdRP). Then run another tool (eg. Kraken2) to verify the rest that does not have a RdRP gene.

trilisser commented 1 year ago

It's really really interesting. Thanks for the advice, I'll discuss it with my scientific director. We have a bad experience with kraken2, a lot of false positive results on viruses, in particular. But that was ONT results, we have not checked Illumina yet.