simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Perl wrapper vs Iplant output #18

Closed WardDeb closed 6 years ago

WardDeb commented 6 years ago

Hi,

We're trying to make VIRsorter run on our cluster. So far it seems to be working. However, when comparing output generated from the online tool (Iplant) with the output generated from the wrapper, we seem to have less contigs called as phage with the perl wrapper script (especially category 2). Can it be that the database used online is not the same as what we can download for the stand-alone version? (URL I used to download database: http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/imicrobe/VirSorter/virsorter-data.tar.gz).

Thanks in advance

simroux commented 6 years ago

Hi, The database used on CyVerse should be the same as the ones available to download, so the differences in your results probably have another origin. Are you using RefSeq or Viromedb ? and are you using both times VirSorter in the same mode ? (standard vs virome contamination) ?

WardDeb commented 6 years ago

Sorry, I should've given you that information already. Both runs are with ViromeDB, and with virome contamination mode. For Iplant I just selected these flags, in the perl wrapper I added arguments --db 2 and --virome

simroux commented 6 years ago

That's weird... would you mind sending me the logs of both runs ?

WardDeb commented 6 years ago

Ofcourse,

I've added both CSV's in attachments, err and out from perl wrapper, and the out from Iplant.

Thanks for looking into this,

Ward

Iplant_VIRSorter_global-phage-signal.txt Iplant_Virsorter_stdout_log.txt Perl_err.txt Perl_out.txt Perl_VIRSorter_global-phage-signal.txt

simroux commented 6 years ago

Thanks. There doesn't seem to be any specific errors from the log, however there might be some error happening at the affiliation step: if you search for "NODE_103" in the two log files, in the Perl version there are two "hallmark" genes flagged, while the perl version also sees enrichment in viral-like genes.

Could you also send the "affi_contigs" files you get in both cases ? Thanks !

simroux commented 6 years ago

Actually, looking into it deeper, it seems like the VirSorter run was not processed in "virome decontamination" mode, while the perl run was. I am double checking now on CyVerse to make sure that the option behaves as it's supposed to..

simroux commented 6 years ago

Ok, I just ran a VirSorter on CyVerse (VirSorter 1.0.3) checking the "virome decontamination" option and got the expected result. Could you try to run your dataset again on CyVerse with the same version of VirSorter, and share the output directory with me ? Thanks.

WardDeb commented 6 years ago

I've reran the dataset on CyVerse, and put the entire results folder in a zip on wetransfer: https://we.tl/AM5eElw2dz

As far as I can tell, virome decontamination setting was on (as seen in the parameter settings), however the output still differs, and NODE_103 still gets classified as category 1.

Thanks for looking further into this.

simroux commented 6 years ago

Oh, sorry I think I got confused here: the VirSorter run from CyVerse seems to have been correctly using the virome decontamination mode, while the offline perl one doesn't. The trick is to look into the log (Perl_out.txt or Iplant_Virsorter_stdout_log.txt) for the call to the script "Step_3_highlight_phage_signal.pl". When used in virome decontamination mode, this line will include a last argument pointing the program to "Generic_ref_file.refs", while this argument will not be present in VirSorter standard mode.

So it looks like the problem is with specifying the "virome" option to the offline VirSorter. I didn't use extensively the docker version of VirSorter, but maybe the option should be "--virome 1" ? If this works as expected, you should get a line stating "THIS WILL BE A VIROME DECONTAMINATION RUN" at the start of the program.

WardDeb commented 6 years ago

You were right, It seems like just adding --virome as flag wasn't enough. I've repeated with --virome 1 and the output now is comparable! In the end I was confused as well, since intuitively I'd expect less contigs to be annotated as bacteriophages if virome decontamionation is turned on. Or is this point of view too naive?

Thanks for looking into this, while in the end the solution was pretty straight forward..

simroux commented 6 years ago

Glad that we found the solution ! I'll update the documentation to clarify the way the "--virome" option has to be set.

For the # of contigs detected in different modes, it really depends on your dataset: if your dataset is mostly composed of "standard" microbial contigs, then both modes should be equivalent. On the other hand, if your dataset has more viruses, then you would have a lot more false negative in the "standard" mode, and indeed more predictions in virome decontamination mode.

Thanks !