simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Running VirSorter from perl wrapper script #4

Closed driscolc closed 6 years ago

driscolc commented 8 years ago

Can we run VirSorter from the perl wrapper script instead of docker? I'd like to run it on our local server cluster, but I don't have root privileges there, so using docker isn't an option. When I try to use the wrapper script, I'm getting an error that I don't understand. This is my command:

/usr/local/Programs/VirSorter/wrapper_phage_contigs_sorter_iPlant.pl --fna assembly.fa --db 2 --wdir ./Virsorter/

Many of the output files/folders are empty. The only files that aren't empty are input_sequences.fna, VIRSorter_circu.list, VIRSorter_contigs_circu_temp.fasta, and VIRSorter_nett.fasta.

The error log looks like this:

sh: 1: Virsorter/fasta/VIRSorter_nett.fasta: Permission denied
sh: 1: Virsorter/fasta/VIRSorter_contigs_circu_temp.fasta: Permission denied

Error: Sequence file Virsorter/fasta/VIRSorter_prots.fasta is empty or misformatted

Error: Sequence file Virsorter/fasta/VIRSorter_prots.fasta is empty or misformatted

Error: Sequence file Virsorter/fasta/VIRSorter_prots.fasta is empty or misformatted

Can't open 'Contigs_prots_vs_PFAMa.tab' for reading: 'No such file or directory' at /usr/local/Programs/VirSorter/Scripts/Step_2_merge_contigs_annotation.pl line 128
Can't open 'VIRSorter_affi-contigs.csv' for reading: 'No such file or directory' at /usr/local/Programs/VirSorter/Scripts/Step_3_highlight_phage_signal.pl line 43
Can't open 'VIRSorter_phage-signal.csv' for reading: 'No such file or directory' at /usr/local/Programs/VirSorter/Scripts/Step_4_summarize_phage_signal.pl line 83

I thought the permission denied may suggest it's a permissions issue, but when I ran the script with sudo on a dummy set on our lab workstation, I got the same result. I'm wondering if it's the code dataset parameter? I'm not entering one because the script seems to already have 'VIRSorter' as an input here, and I'm unsure what this parameter actually means.

simroux commented 8 years ago

You can run VirSorter from the wrapper script without Docker, however this means you will have to do manually the steps that Docker takes care of, namely downloading and extracting the database files (http://mirrors.iplantcollaborative.org/browse/iplant/home/shared/imicrobe/VirSorter/virsorter-data.tar.gz), and modify the different paths in the wrapper scripts to point to these files (l.28 $data_dir has to point to the database folder, l.14 $Bin has now to point to the folder where VirSorter "Scripts" directory is located) and install all the softwares required by VirSorter (see l. 99 -102).

Alternatively, you can use the CyVerse (formerly iPlant) Discovery Environment to process datasets with VirSorter (http://de.iplantcollaborative.org/), privately and free of charge.

driscolc commented 8 years ago

Thanks! This basically fixed the error, I'm now getting the global phage signal csv file with the predictions. I pointed to the directories you mentioned and added this line to the wrapper perl script after "use FindBin '$Bin';":

use lib "$Bin::Bin/../Scripts";

I'm having one other issue however and perhaps you can help. The Predicted_viral_sequences folder is empty, and I'm not sure why. The log_err file looks like this:

Can't open 'Contigs_prots_vs_Phage_Gene_unclustered.tab' for reading: 'No such file or directory' at FullPath/VirSorter/Scripts/Step_2_merge_contigs_annotation.pl line 79
Can't open 'VIRSorter_affi-contigs.csv' for reading: 'No such file or directory' at FullPath/VirSorter/Scripts/Step_3_highlight_phage_signal.pl line 43
Can't open 'VIRSorter_phage-signal.csv' for reading: 'No such file or directory' at FullPath/VirSorter/Scripts/Step_4_summarize_phage_signal.pl line 83
Can't open '<CWD/fasta/VIRSorter_prots.fasta' for reading: 'No such file or directory' at FullPath/VirSorter/Scripts/Step_5_get_phage_fasta-gb.pl line 152

Everything else seems to be working.

simroux commented 8 years ago

One step forward indeed, but it looks like VirSorter scripts has trouble accessing the BLAST result ('Contigs_prots_vs_Phage_Gene_unclustered.tab'), which then makes all the subsequent scripts fail as well. You might want to check the log_out file, especially the lines around "Step 1.3 :"

driscolc commented 8 years ago

Okay, I've identified the problem. I was attempting to run the wrapper script on multiple assemblies through a for loop while in another directory. The wrapper script specifies the working directory as the current working directory, so being in another directory was the problem. It successfully finished when I ran the wrapper script in the same directory as the output files. Thanks for your help!

Edit: Haven't figured it out. Seems to have worked on one assembly, but not another. I'll update if I figure this out.

atsumarox commented 8 years ago

I would like to run VirSorter from the perl wrapper script instead of docker, too, and I have different problems. It seems to succeed in starting the perl wrapper script, but the result files were not created. It would be appreciated if you would help me solve this problems.

The error log on err file looks like this: Error: Failed to open sequence file ../Virome/fasta/VIRSorter_prots.fasta for reading Error: Failed to open sequence file ../Virome/fasta/VIRSorter_prots.fasta for reading Error: Failed to open sequence file ../Virome/fasta/VIRSorter_prots.fasta for reading Command line argument error: Argument "query". File is not accessible: `../Virome/fasta/VIRSorter_prots.fasta' Can't open '../Virome/fasta/VIRSorter_circu.list' for reading: 'No such file or directory' at /lustre2/home/hoge/bin/Scripts/Step_2_merge_contigs_annotation.pl line 39 Can't open '../Virome/VIRSorter_affi-contigs.csv' for reading: 'No such file or directory' at /lustre2/home/hoge/bin/Scripts/Step_3_highlight_phage_signal.pl line 43 Can't open '../Virome/VIRSorter_phage-signal.csv' for reading: 'No such file or directory' at /lustre2/home/hoge/bin/Scripts/Step_4_summarize_phage_signal.pl line 83

Error: Failed to open sequence file ../Virome/fasta/VIRSorter_prots.fasta for reading Error: Failed to open sequence file ../Virome/fasta/VIRSorter_prots.fasta for reading Error: Failed to open sequence file ../Virome/fasta/VIRSorter_prots.fasta for reading

The error log on command line looks like this: /home/hoge/bin/wrapper_phage_contigs_sorter_iPlant.pl -- fna ./Metagenome/results/pandelecture/pandelecture.idba.contig/scaffold.fa --wdir ./Virome/ --db 1 Bin : /lustre2/home/hoge/bin Dataset : VIRSorter Input file : ./Metagenome/results/pandelecture/pandelecture.idba.contig/scaffold.fa Db : 1 Working dir : ./Virome/ Custom phages :

Step 0.8 : /usr/local/bin/hmmsearch --tblout Virome/Contigs_prots_vs_PFAMa.tab --cpu 16 -o Virome/Contigs_prots_vs_PFAMa.out --noali /home/hoge/src/VirSorter-master/virsorter-data/data/PFAM_27/Pfam-A.hmm Virome/fasta/VIRSorter_prots.fasta >> Virome/logs/out 2>> Virome/logs/err

Step 0.9 : /usr/local/bin/hmmsearch --tblout Virome/Contigs_prots_vs_PFAMb.tab --cpu 16 -o Virome/Contigs_prots_vs_PFAMb.out --noali /home/hoge/src/VirSorter-master/virsorter-data/data/PFAM_27/Pfam-B.hmm Virome/fasta/VIRSorter_prots.fasta >> Virome/logs/out 2>> Virome/logs/err

Revision 0 Step 2 : /lustre2/home/hoge/bin/Scripts/Step_2_merge_contigs_annotation.pl Virome/fasta/VIRSorter_mga_final.predict Virome/Contigs_prots_vs_Phage_Gene_Catalog.tab Virome/Contigs_prots_vs_Phage_Gene_unclustered.tab Virome/Contigs_prots_vs_PFAMa.tab Virome/Contigs_prots_vs_PFAMb.tab /home/hoge/src/VirSorter-master/virsorter-data/data/Phage_gene_catalog/Phage_Clusters_current.tab Virome/VIRSorter_affi-contigs.csv >> Virome/logs/out 2>> Virome/logs/err

Step 3 : /lustre2/home/hoge/bin/Scripts/Step_3_highlight_phage_signal.pl Virome/VIRSorter_affi-contigs.csv Virome/VIRSorter_phage-signal.csv >> Virome/logs/out 2>> Virome/logs/err

Setting up the final result file Step 4 : /lustre2/home/hoge/bin/Scripts/Step_4_summarize_phage_signal.pl Virome/VIRSorter_affi-contigs.csv Virome/VIRSorter_phage-signal.csv Virome/VIRSorter_global-phage-signal.csv Virome/VIRSorter_new_prot_list.csv >> Virome/logs/out 2>> Virome/logs/err

Step 5 : /lustre2/home/hoge/bin/Scripts/Step_5_get_phage_fasta-gb.pl VIRSorter ./Virome/ >> Virome/logs/out 2>> Virome/logs/err

Cleaning the output directory mv: cannot stat Virome/Contigs_prots_vs_Phage_Gene_Catalog.tab': No such file or directory mv: cannot statVirome/Contigs_prots_vs_Phage_Gene_unclustered.tab': No such file or directory mv: cannot stat Virome/Contigs_prots_vs_PFAMa.tab': No such file or directory mv: cannot statVirome/Contigs_prots_vs_PFAMa.out': No such file or directory mv: cannot stat Virome/Contigs_prots_vs_PFAMb.tab': No such file or directory mv: cannot statVirome/Contigs_prots_vs_PFAMb.out': No such file or directory mv: cannot stat Virome/VIRSorter_affi-contigs.csv': No such file or directory mv: cannot statVIRSorter_affi-contigs.refs': No such file or directory mv: cannot stat `Virome/VIRSorter_phage-signal.csv': No such file or directory

simroux commented 8 years ago

Hi,

This looks like one of the very first steps (the ORF predictions of VirSorter) did not work as expected. Could you verify in the folder :../Virome/fasta/" if you have any file named "VIRSorter_prots.fasta" and/or files with the extension ".predict" ?

atsumarox commented 8 years ago

Thank you for your prompt reply. I checked ../Virome/fasta/ , and there are no files in that folder, so it seems the very first steps did not work as you said. It would be appreciated if you would help me solve this problems.

simroux commented 8 years ago

Ok, so if the "fasta" directory is empty, this failed even earlier than I thought, i.e. at the very first step which is copying the input file sequences (here "./Metagenome/results/pandelecture/pandelecture.idba.contig/scaffold.fa") in the "fasta" folder. Could this be a problem of permissions (i.e. VirSorter scripts may not have permission to write in the "../Virome" folder ?)

atsumarox commented 8 years ago

I changed wrapper scripts, scaffold.fa and Virome directory's permission to 777(chmod), but results are the same. I do not have root permission, is this related? or is there any other way to check VirSorter scripts' permission?

simroux commented 8 years ago

You may want to run the script that performs this first step out of the wrapper, to check potential error messages. In the VirSorter output (log_out), you should have a line starting with "Step 0.5 :", and then a command line (starting Step_1_contigs_cleaning_and_gene_prediction.pl). One thing that may happen too is that the script directory of VirSorter ("Scripts/") is usually added to the bin path by Docker, but it may not be the case for you if you don't use Docker. You may want to add a line like "my $Bin="./Scripts/" l. 73, and see if it helps.

atsumarox commented 8 years ago

Thank you for you great help. I got one step forward, and now error log on command line looks like this: Step 5 : /lustre2/home/hoge/bin/Scripts/Step_5_get_phage_fasta-gb.pl VIRSorter HogeMetagenome/result/ >> HogeMetagenome/result/logs/out 2>> HogeMetagenome/result/logs/err Cleaning the output directory rm -r hogeMetagenome/result/r_0/db : mv: cannot stat `VIRSorter_affi-contigs.refs': No such file or directory

However, VIRSorter_affi-contigs.refs was actually created and that file was located in the same directory with "fasta"directory, "r_0" directory, "Metric_files"directory "Tab_files" directory, and "VIRSorter_global-phage-signal.csv" file.

It would be appreciated if you would help me solve this problems.

simroux commented 8 years ago

Glad to know that it (seemed to) work better !

For the last error, I think the only thing to modify would be l. 477 of the "wrapper_phage_contigs_sorter_iPlant.pl". You could try to change the line to: my $out_file_affi_ref = catdir($wdir, $code_dataset . "_affi-contigs.refs");

However, just so you know, this is the very last part of the script which is simply re-organizing the final files, so all the VirSorter results should be correct despite the error.

aelbehery commented 7 years ago

Hello Dr. Roux,

I also need to run VirSorter from the wrapper script. Docker is not allowed by the system administrator. I downloaded and extracted the virsorter-data folder and made the required path changes in the wrapper script. However, the program runs into an error. Log files are attached. err.txt out.txt

Could you please help me identify what went wrong?

Best regards,

Ali

simroux commented 7 years ago

Hi, There seems to be a few errors remaining:

Let me know !

Simon

aelbehery commented 7 years ago

Hi!

Thanks for the quick reply! I have tried to do the blast out of the wrapper script, but it seems that the error is earlier in the consequence because the blast database file "Contigs_prots_vs_New_unclustered.tab" is empty. I don't know what this means. I am not really familiar with Perl. I also noticed that during running the program, it jumps from step 0.9 which seems to be OK to step 2 and no status messages are printed for step 1 as you can see below:

`Bin            : /home/viro/ali.elbehery/sources/VirSorter-master
Dataset        : VIRSorter
Input file     : 454AllContigs.fna
Db             : 1
Working dir    : /home/viro/ali.elbehery/sources/VirSorter-master
Custom phages  :

Step 0.8 : /home/viro/ali.elbehery/bin/hmmsearch --tblout /home/viro/ali.elbehery/sources/VirSorter-master/Contigs_prots_vs_PFAM --cpu 16 -o /home/viro/ali.elbehery/sources/VirSorter-master/Contigs_prots_vs_PFAMa.out --noali /home/viro/ali.elbehery/sourcesorter-master/data/PFAM_27/Pfam-A.hmm /home/viro/ali.elbehery/sources/VirSorter-master/fasta/VIRSorter_prots.fasta >> /home/viro/lbehery/sources/VirSorter-master/logs/out 2>> /home/viro/ali.elbehery/sources/VirSorter-master/logs/err

Step 0.9 : /home/viro/ali.elbehery/bin/hmmsearch --tblout /home/viro/ali.elbehery/sources/VirSorter-master/Contigs_prots_vs_PFAM --cpu 16 -o /home/viro/ali.elbehery/sources/VirSorter-master/Contigs_prots_vs_PFAMb.out --noali /home/viro/ali.elbehery/sourcesorter-master/data/PFAM_27/Pfam-B.hmm /home/viro/ali.elbehery/sources/VirSorter-master/fasta/VIRSorter_prots.fasta >> /home/viro/lbehery/sources/VirSorter-master/logs/out 2>> /home/viro/ali.elbehery/sources/VirSorter-master/logs/err

### Revision 0
Step 2 : /home/viro/ali.elbehery/sources/VirSorter-master/Scripts/Step_2_merge_contigs_annotation.pl /home/viro/ali.elbehery/souVirSorter-master/fasta/VIRSorter_mga_final.predict /home/viro/ali.elbehery/sources/VirSorter-master/Contigs_prots_vs_Phage_Gene_og.tab /home/viro/ali.elbehery/sources/VirSorter-master/Contigs_prots_vs_Phage_Gene_unclustered.tab /home/viro/ali.elbehery/sourirSorter-master/Contigs_prots_vs_PFAMa.tab /home/viro/ali.elbehery/sources/VirSorter-master/Contigs_prots_vs_PFAMb.tab /home/vir.elbehery/sources/VirSorter-master/data/Phage_gene_catalog/Phage_Clusters_current.tab /home/viro/ali.elbehery/sources/VirSorter-r/VIRSorter_affi-contigs.csv >> /home/viro/ali.elbehery/sources/VirSorter-master/logs/out 2>> /home/viro/ali.elbehery/sources/Vier-master/logs/err

Step 3 : /home/viro/ali.elbehery/sources/VirSorter-master/Scripts/Step_3_highlight_phage_signal.pl /home/viro/ali.elbehery/sourcrSorter-master/VIRSorter_affi-contigs.csv /home/viro/ali.elbehery/sources/VirSorter-master/VIRSorter_phage-signal.csv >> /home/vli.elbehery/sources/VirSorter-master/logs/out 2>> /home/viro/ali.elbehery/sources/VirSorter-master/logs/err

Setting up the final result file
Step 4 : /home/viro/ali.elbehery/sources/VirSorter-master/Scripts/Step_4_summarize_phage_signal.pl /home/viro/ali.elbehery/sourcrSorter-master/VIRSorter_affi-contigs.csv /home/viro/ali.elbehery/sources/VirSorter-master/VIRSorter_phage-signal.csv /home/viroelbehery/sources/VirSorter-master/VIRSorter_global-phage-signal.csv /home/viro/ali.elbehery/sources/VirSorter-master/VIRSorter_not_list.csv >> /home/viro/ali.elbehery/sources/VirSorter-master/logs/out 2>> /home/viro/ali.elbehery/sources/VirSorter-master/lor

Step 5 : /home/viro/ali.elbehery/sources/VirSorter-master/Scripts/Step_5_get_phage_fasta-gb.pl VIRSorter /home/viro/ali.elbeheryces/VirSorter-master >> /home/viro/ali.elbehery/sources/VirSorter-master/logs/out 2>> /home/viro/ali.elbehery/sources/VirSorter-r/logs/err

Cleaning the output directory
mv: cannot stat `/home/viro/ali.elbehery/sources/VirSorter-master/Contigs_prots_vs_Phage_Gene_Catalog.tab': No such file or dire
mv: cannot stat `/home/viro/ali.elbehery/sources/VirSorter-master/Contigs_prots_vs_Phage_Gene_unclustered.tab': No such file or tory
mv: cannot stat `/home/viro/ali.elbehery/sources/VirSorter-master/VIRSorter_affi-contigs.csv': No such file or directory
mv: cannot stat `VIRSorter_affi-contigs.refs': No such file or directory
mv: cannot stat `/home/viro/ali.elbehery/sources/VirSorter-master/VIRSorter_phage-signal.csv': No such file or directory``

Your help is really appreciated!

simroux commented 7 years ago

Hi,

In your last attempt to run VirSorter, did you use a pre-existing output directory, or a new one ? If the former, then VirSorter will try to use pre-existing file, and this may explain why it skipped Step 1 and went directly to Step 2.

The file "Contigs_prots_vs_New_unclustered.tab" should be the result of a blastp between the newly predicted ORFs (from your sequences), and a db named "Pool_new_unclustered" (which should be in the "virsorter-data" package, in the Phage_gene_catalog or Phage_gene_catalog_plus_viromes directory). We should probably first check that all these files are where they're supposed to be ^^

aelbehery commented 7 years ago

Hi, Thank you so much! I didn't know that I have to delete old files and folders from previous runs. I thought they are overwritten. I also added BioPerl to my path. Now it works with no errors. However, no viral contigs could be detected in the assembly file I used. Does that mean it has no viral signals?

simroux commented 7 years ago

Great ! And yes, it might mean that the data do not contain any viral sequence, or that something is wrong but does not trigger any error (this is still possible). To check this, could you share / send me the file "VirSorter_affi_contigs.csv" ? This file includes gene by gene affiliation of all contigs, and we should be able to see from there (i) if genes were correctly predicted, (ii) if the database searches (PFAM + viral db) went ok, and (iii) if indeed all contigs look microbial.

aelbehery commented 7 years ago

Do you mean these files?

VIRSorter_affi-contigs.refs.txt VIRSorter_affi-contigs.tab.txt Unfortunately, I couldn't understand what each column represents.

simroux commented 7 years ago

Hi,

Yes, the one I was looking for was "VIRSorter_affi-contigs.tab". About the column meaning, you should have in VIRSorter output directory a "Readme.txt" file that should indicate what each column means (depending on the version of VIRSorter you are using, unfortunately we added this in one of the later version, sorry about that...).

In this tab file, the columns are as follows:

Contig name|number of genes|contig type (linear - l or circular - c) Gene name|start|stop|length|strand|Affiliation to a phage cluster|BLAST score|BLAST e-value|Phage cluster category|Affiliation to PFAM|HMMSearch score|HMMSearch e-value

In your file, everything seems to be ok (i.e. genes can have a phage cluster affiliation, PFAM affiliation, or both). The reason you did not get any sequence detected as viral is because most (if not all) of these contigs are likely viral. By default, VIRSorter tries to estimate the "viral-ness" of the whole genome / community by looking at all contigs first to calculate background probabilities, and then tries to identify which contigs are more viral than this "average viral-ness". When a majority of contigs are actually viral, this ends up with no contig being detected. The way to go around this is to run VIRSorter with the option "virome decontamination" on, which is designed to deal with this type of dataset (it uses pre-calculated background probabilities from known microbial genomes). That way, you should have some viral sequences detected.

aelbehery commented 7 years ago

Thanks a lot!