simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Only one predicted viral sequence #50

Open thr44pw00d opened 5 years ago

thr44pw00d commented 5 years ago

Dear Simon,

I've been running VirSorter on the cyverse platform before, and now installed it on a server (CentOS Linux 7), as I'd like to use it for a larger number of samples. I followed the installation protocol as described here under "Using a conda virtual environment (tested on Ubuntu and CentOS)".

I started a test run with four phage genomes, and unfortunately I'm not getting the same results than through cyverse before. In the cyverse run, two of the genomes were grouped into cat 2, and one into cat 3. Now with my installation no viral sequence was predicted. The submitted command was: wrapper_phage_contigs_sorter_iPlant.pl -f B3F.fas --db 2 --wdir /proj/uppstore2017229/projects/matthias/virsorter/B3F_out_viromedb --ncpu 1 --data-dir /proj/uppstore2017229/projects/matthias/virsorter/virsorter-data/

Then I tried the --no_c option: wrapper_phage_contigs_sorter_iPlant.pl -f B3F.fas --no_c --db 2 --wdir /proj/uppstore2017229/projects/matthias/virsorter/B3F_out_viromedb_no_c --ncpu 1 --data-dir /proj/uppstore2017229/projects/matthias/virsorter/virsorter-data/ This grouped one genome into cat 4, but that was it.

On the command line I get the following output before the run ends. Not sure if this may be connected to the problem.

Step 5 : /domus/h1/matthih/miniconda3/envs/virsorter/bin/Scripts/Step_5_get_phage_fasta-gb.pl VIRSorter /proj/uppstore2017229/projects/matthias/virsorter/B3F_out_viromedb_no_c >> /proj/uppstore2017229/projects/matthias/virsorter/B3F_out_viromedb_no_c/logs/out 2>> /proj/uppstore2017229/projects/matthias/virsorter/B3F_out_viromedb_no_c/logs/err

## Verify if this should have been a virome decontamination mode based on 10kb+ contigs
Use of uninitialized value in division (/) at /home/matthih/miniconda3/envs/virsorter/bin/wrapper_phage_contigs_sorter_iPlant.pl line 720.
Cleaning the output directory
rm -r /proj/uppstore2017229/projects/matthias/virsorter/B3F_out_viromedb_no_c/r_0/db : 

I attached some output files (incl. err, out, and stdout saved to file) from both runs. [default.zip](https://github.com/simroux/VirSorter/files/3615889/default_.zip) noc.zip

I guess that the "VIRSorter_affi-contigs.tab" files look good (almost identical in both runs). The problem might occur after that, when generating "VIRSorter_phage_signal.tab". This file is empty in the "default" run and only includes one line in the "--no_c" run.

Would you be so kind to take a look if you can identify the problem?

Thanks a lot! /Matthias

simroux commented 5 years ago

Hi Matthias, Did you run with the "--virome" option ? This is needed when providing VirSorter with an input file which is mostly (in your case entirely) viral. The reason is that VirSorter was initially designed for microbial single-cell genomes and metagenomes, i.e. in its default mode, VirSorter will first evaluate the different gene content features (i.e. % of viral genes, % of genes without PFAM affiliation, etc) on the whole dataset, and then look for contigs and or regions that are "more viral than average" (roughly). The "--virome" option bypasses this and forces the use of pre-computed features (based on microbial genomes from RefSeq).

Best, Simon

thr44pw00d commented 5 years ago

Hi Simon,

Thanks a lot for your answer! I was not aware of that. I ran it now with "--virome" and all four phages were grouped into cat 2. That's great!

Best wishes, Matthias