simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

--ncpu argument not working #10

Closed shaman-narayanasamy closed 6 years ago

shaman-narayanasamy commented 7 years ago

Hi Simon,

Me again. I am using the docker version of VirSorter. It looks like the --ncpu command is not working.

When I launch an instance of VirSorter. I get the following output:

$ docker run -v /home/snarayanasamy/Work/tools/Virsorter/virsorter-data:/data -v /home/snarayanasamy/Work/virsorter_test:/indir -v /home/snarayanasamy/Work/virsorter_test/outputdir:/wdir --rm discoenv/virsorter:v1.0.3 -w /wdir --db 2 --fna /indir/HF1.assembly.fa --ncpu 4

Unknown option: ncpu
Dataset      : VIRSorter
Fna file     : /indir/HF1.assembly.fa
Db           : 2
Wdir         : /wdir
Custom phages: 
#%#%#%#%#%# Processing VIRSorter....
Step 0.8 : /usr/local/bin/hmmsearch --tblout Contigs_prots_vs_PFAMa.tab --cpu 8 -o Contigs_prots_vs_PFAMa.out --noali /data/PFAM_27/Pfam-A.hmm /wdir/fasta/VIRSorter_prots.fasta >> /wdir/log_out 2>> /wdir/log_err

As you can see, it says that the argument does not exist, and the number of cpu's are still eight instead of four, as specified.

Also, with my limited experience in using docker. I tried issuing a "vanilla" command such that it prints the help:

$ docker run -it discoenv/virsorter:v1.0.3 --help
Usage:
      ./wrapper_phage_contigs_sorter_iPlant.pl -d Code_dataset --fna Fasta file of contigs --db 1 --wdir /path/to/working_directory
      Database codes : 1 for RefseqABVir only, 2 for RefseqABVir + Viromes
      An additional set of reference sequences can be added to the database as a fasta file with the argument cp (--cp /path/to/fasta_file)

As far as I can see, the --ncpu option is not available.

Looking forward to hear from you :)

-Shaman-

simroux commented 7 years ago

Hi Shaman, Did you try the option "--cpu" ? This was the one used in a previous version of the VirSorter script, and I wonder if the docker version still has this one. If still not working, we''ll look at how to add this option in the docker container. Best, Simon

shaman-narayanasamy commented 7 years ago

Hi Simon,

Sorry, but the --cpu does not work as well. However, I have some other problem now. I minimized the command and ran it again:

$ docker run -v /home/snarayanasamy/Work/tools/Virsorter/virsorter-data:/data -v /home/snarayanasamy/Work/virsorter_test:/wdir --rm discoenv/virsorter:v1.0.3 --db 2 --fna /wdir/Bio17-1.fa -w /wdir/
Dataset      : VIRSorter
Fna file     : /wdir/Bio17-1.fa
Db           : 2
Wdir         : /wdir/
Custom phages:
#%#%#%#%#%# Processing VIRSorter....
Step 0.5 : /usr/local/bin/Scripts/Step_1_contigs_cleaning_and_gene_prediction.pl VIRSorter /wdir/fasta /wdir/fasta/input_sequences.fna 2 >> /wdir/log_out 2>> /wdir/log_err

Step 0.8 : /usr/local/bin/hmmsearch --tblout Contigs_prots_vs_PFAMa.tab --cpu 8 -o Contigs_prots_vs_PFAMa.out --noali /data/PFAM_27/Pfam-A.hmm /wdir/fasta/VIRSorter_prots.fasta >> /wdir/log_out 2>> /wdir/log_err

Step 0.9 : /usr/local/bin/hmmsearch --tblout Contigs_prots_vs_PFAMb.tab --cpu 8 -o Contigs_prots_vs_PFAMb.out --noali /data/PFAM_27/Pfam-B.hmm /wdir/fasta/VIRSorter_prots.fasta >> /wdir/log_out 2>> /wdir/log_err

### Revision 0
Out :
Step 1.2 : /usr/local/bin/hmmsearch --tblout r_0/Contigs_prots_vs_New_clusters.tab --cpu 8 -o r_0/Contigs_prots_vs_New_clusters.out --noali r_0/db/Pool_clusters.hmm /wdir/fasta/VIRSorter_prots.fasta >> /wdir/log_out 2>> /wdir/log_err

Step 1.3 : /usr/local/bin/blastall -p blastp -i /wdir/fasta/VIRSorter_prots.fasta -d r_0/db/Pool_new_unclustered -o r_0/Contigs_prots_vs_New_unclustered.tab -a 8 -m 8 -e 0.001 >> /wdir/log_out 2>> /wdir/log_err

Step 2 : /usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl /wdir/fasta/VIRSorter_mga_final.predict Contigs_prots_vs_Phage_Gene_Catalog.tab Contigs_prots_vs_Phage_Gene_unclustered.tab Contigs_prots_vs_PFAMa.tab Contigs_prots_vs_PFAMb.tab /data/Phage_gene_catalog_plus_viromes/Phage_Clusters_current.tab VIRSorter_affi-contigs.csv >> /wdir/log_out 2>> /wdir/log_err

Step 3 : /usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl VIRSorter_affi-contigs.csv VIRSorter_phage-signal.csv >> /wdir/log_out 2>> /wdir/log_err

Setting up the final result file
Step 4 : /usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl VIRSorter_affi-contigs.csv VIRSorter_phage-signal.csv VIRSorter_global-phage-signal.csv VIRSorter_new_prot_list.csv >> /wdir/log_out 2>> /wdir/log_err

Step 5 : /usr/local/bin/Scripts/Step_5_get_phage_fasta-gb.pl VIRSorter >> /wdir/log_out 2>> /wdir/log_err

Cleaning the output directory
rm -r r_0/db :
mv: cannot stat 'fasta/': No such file or directory
mv: cannot stat 'formatdb.log': No such file or directory
mv: cannot stat 'log_err': No such file or directory
mv: cannot stat 'log_out': No such file or directory

In the log_err file, I find this:

Can't open '</root/fasta/VIRSorter_prots.fasta' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_5_get_phage_fasta-gb.pl line 152

It is strange as I didn't have this problem the last time I ran VirSorter some months back. Should I open this as a separate issue?

Looking forward to your reply.

-Shaman-

simroux commented 7 years ago

This is surprising, since /root/fasta/VIRSorter_prots.fasta is used at multiple steps, but it looks like you only get an error at the very last one. Did you try to run VirSorter on a pre-existing directory ? Or could there be some permission issues in your working directory ?

shaman-narayanasamy commented 7 years ago

Possible, I am rerunning in a different directory now, but with the same outcome.

I also realize that when I rerun it on the same directory, it defaults to /root and not to /wrk, for some reason. Does it not overwrite?

simroux commented 7 years ago

Unfortunately, this looks like a docker issue more than a VirSorter issue as far as I can tell. I don't know exactly how docker would behave when running on the same directory, but VirSorter is not really designed to do this.

Beyond the error with the step_5 script, do you still have the expected VirSorter output ?

shaman-narayanasamy commented 7 years ago

Yes, this is a docker issue and an issue of me not knowing how VirSorter actually works :) It looks like there is some confusion between the /wdir and /root directory within the docker container/image.

So, here is what I tried: Therefore, I cleaned up the /wdir and I stripped down the command:

$ docker run -v /home/snarayanasamy/Work/tools/Virsorter/virsorter-data:/data -v /home/snarayanasamy/Work/virsorter_test:/wdir --rm discoenv/virsorter:v1.0.3 --db 2 --fna /wdir/Bio17-1.fa

My wdir is still empty and I am unable to access any of the logs now, because it was printed into /root.

With this, I realized that the problem was that the container starts running and the files are printed to /root folder, which exists in an alternate dimension, such that mere mortals cannot access. Therefore, this time, I mounted an additional folder called for the /root which stores the "intermediate" files, which will allow me to access it.

$ docker run -v /home/snarayanasamy/Work/tools/Vir
sorter/virsorter-data:/data -v /home/snarayanasamy/Work/virsorter_test:/wdir -v /home/snarayanasamy/Work
/virsorter_test/intermediate:/root --rm discoenv/virsorter:v1.0.3 --db 2 --fna /wdir/Bio17-1.fa

This works fine and dandy and I can now access the output files. VirSorter works! Then, to test further, I create three folders and mount them. One for the input (/input), output (/output) and for intermediate files (/root). I launched the following docker command from the /output directory, like so...

$ docker run -v /home/snarayanasamy/Work/tools/Virsorter/virsorter-data:/data -v /home/snarayanasamy/Work/virsorter_test/input:/input -v /home/snarayanasamy/Work/virsorter_test/output:/root -v /home/snarayanasamy/Work/virsorter_test/intermediate:/wdir --rm discoenv/virsorter:v1.0.3 --db 2 --fna /input/Bio17-1.fa --wdir /wdir/

I guess with this, I have my input, intermediate and output files where I want them to be, but I still get this at the end of a run:

Cleaning the output directory
rm -r r_0/db : 
mv: cannot stat 'fasta/': No such file or directory
mv: cannot stat 'formatdb.log': No such file or directory
mv: cannot stat 'log_err': No such file or directory
mv: cannot stat 'log_out': No such file or directory

I don't think it is a big problem, but it would be more satisfying to get a clean run :) I will keep trying this out.

We still haven't solved why the --cpu parameter is not in there :(

shaman-narayanasamy commented 7 years ago

Hi Simon,

Sorry for the flurry of comments. I'm still trying to figure this out. I continue from the point I left off above. I entered the container interactively to check for the --ncpu parameter. The command below allows that:

$ docker run --entrypoint /bin/bash -it -v /home/snarayanasamy/Work/tools/Virsorter/virsorter-data:/data -v /home/snarayanasamy/Work/virsorter_test/input:/input -v /home/snarayanasamy/Work/virsorter_test/output:/root -v /home/snarayanasamy/Work/virsorter_test/intermediate:/wdir --rm discoenv/virsorter:v1.0.3

And when I view the wrapper_phage_contigs_sorter_iPlant.pl inside the container, it looks like so:

#!/usr/bin/env perl

=head1 USAGE

  ./wrapper_phage_contigs_sorter_iPlant.pl -d Code_dataset --fna Fasta file of contigs --db 1 --wdir /path/to/working_directory
  Database codes : 1 for RefseqABVir only, 2 for RefseqABVir + Viromes
  An additional set of reference sequences can be added to the database as a fasta file with the argument cp (--cp /path/to/fasta_file)

=cut

use strict;
use warnings;
use autodie;
use FindBin '$Bin';
use File::Spec::Functions;
use File::Path 'mkpath';
use File::Which 'which';
use Getopt::Long 'GetOptions';
use Pod::Usage;
use Cwd 'cwd';

my $help              = '';
my $code_dataset      = 'VIRSorter';
my $original_fna_file = '';
my $choice_database   = '';
my $tag_virome        = 0;
my $custom_phage      = '';
my $data_dir          = '/data';
my $wdir              = cwd();

GetOptions(
   'fna=s'       => \$original_fna_file,
   'd|dataset:s' => \$code_dataset,
   'db:i'        => \$choice_database,
   'virome:i'    => \$tag_virome,
   'wdir:s'      => \$wdir,
   'cp:s'        => \$custom_phage,
   'data-dir:s'  => \$data_dir,
   'h|help'      => \$help,
);

As you can see, the --ncpu parameter is missing. In fact, the wrapper script is not the same as the one in the git repo, which looks like this:

#!/usr/bin/env perl

=head1 SYNOPSIS

  wrapper_phage_contigs_sorter_iPlant.pl --fasta sequences.fa

Required Arguments:

  -f|--fna       Fasta file of contigs

Options: 

  -d|--dataset   Code dataset (DEFAULT "VIRSorter")
  --cp           Custom phage sequence 
  --db           Either "1" (DEFAULT Refseqdb) or "2" (Viromedb)
  --wdir         Working directory (DEFAULT cwd)
  --ncpu         Number of CPUs
  --virome       Virome decontamination mode, for datasets mostly viral, force the use of generic metrics instead of calculated from the whole dataset

  --help         Show help and exit

=head1 DESCRIPTION

Wrapper for detection of viral contigs

=cut

use strict;
use warnings;
use feature 'say';
use autodie;
use FindBin '$Bin';
use File::Spec::Functions;
use File::Path 'mkpath';
use File::Which 'which';
use Getopt::Long 'GetOptions';
use Pod::Usage;
use Cwd 'cwd';

my $help            = '';
my $code_dataset    = 'VIRSorter';
my $input_file      = '';
my $choice_database = 1;
my $tag_virome      = 0;
my $custom_phage    = '';
my $data_dir        = '/data';
my $n_cpus          = 16;
my $wdir            = cwd();

GetOptions(
   'f|fna=s'     => \$input_file,
   'd|dataset:s' => \$code_dataset,
   'db:i'        => \$choice_database,
   'virome:i'    => \$tag_virome,
   'wdir:s'      => \$wdir,
   'cp:s'        => \$custom_phage,
   'data-dir:s'  => \$data_dir,
   'ncpu:i'      => \$n_cpus,
   'h|help'      => \$help,
);

This can be bypassed by mounting all the latest scripts like so:

$  docker run --entrypoint /bin/bash -it -v /home/snarayanasamy/Work/tools/Virsorter/VirSorter/Scripts:/usr/local/bin/Scripts -v /home/snarayanasamy/Work/tools/Virsorter/VirSorter/wrapper_phage_contigs_sorter_iPlant.pl:/usr/local/bin/wrapper_phage_contigs_sorter_iPlant.pl -v /home/snarayanasamy/Work/tools/Virsorter/virsorter-data:/data -v /home/snarayanasamy/Work/virsorter_test/input:/input -v /home/snarayanasamy/Work/virsorter_test/output:/root -v /home/snarayanasamy/Work/virsorter_test/intermediate:/wdir --rm discoenv/virsorter:v1.0.3

But then, I it tells me that blastp is missing:

Missing blastp
Bin            : /usr/local/bin
Dataset        : VIRSorter
Input file     : /input/Bio17-1.fa
Db             : 2
Working dir    : /wdir/
Custom phages  : 

Once again, I go into the container interactively and check for blastp and it looks like it is not present. There is only blastall installed. I tried this with two docker images; i) discoenv/virsorter:v1.0.3 and ii) cyverse/virsorter:latest. Both of them lack blastp. Is there an updated version of the docker image that I may be unaware of? FYI: I am looking at all the images available via DockerHub.

I look forward to your reply. Let me know if there is anything else I should test :)

-Shaman-

simroux commented 7 years ago

Hi Shaman, Unfortunately, I didn't generate the VirSorter docker image, so I'm not sure exactly which version would be the latest. I will ask Ken to stop by this thread and see if he can help !

Best, Simon

shaman-narayanasamy commented 7 years ago

Yes, I understand that. Docker is not the easiest thing to work with. I was just about to notify you that I somewhat solved the issue, thanks to the help of one of my docker-geek friends. Principally, we need a new image with all the updated dependencies and codes "frozen" into it.

You can see the solution in my repo (https://github.com/shaman-narayanasamy/VirSorter). Please forgive the crudeness of my solution. Perhaps Ken could figure something better and more clean.

Do let me know if you guys need any assistance with this :)

Cheers, Shaman

simroux commented 7 years ago

Glad that you found a work-around ! I'll forward this to Ken just in case,

Best, Simon