simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

initial setup and /wdir/fasta/VIRSorter_prots.fasta #3

Closed naarkhoo closed 7 years ago

naarkhoo commented 8 years ago

I am trying to run Virsorter on my machine; I am getting a bunch of errors, which I believe is because of incomplete installation. I appreciate to hear your comments.

So basically, my virsorter directory includes

directory : virsorter-data/(downloaded from iplant) directory : virsorter-run/(I make it by mkdir) files : 2 fastq files (1.fq.gz and 2.fq.gz)

and then I run

docker run -v /home/virsorter/virsorter-data:/data -v /home/virsorter/virsorter-run/:/wdir -w /wdir --rm discoenv/virsorter:v1.0.3 --db 2 --fna /Projects//virsorter/LIB1.screened.hg19.screened.mOTU.v1.padded.pair.1.fq.gz

the output looks like

Step 1.2 : /usr/local/bin/hmmsearch --tblout r_0/Contigs_prots_vs_New_clusters.tab --cpu 8 -o r_0/Contigs_prots_vs_New_clusters.out --noali r_0/db/Pool_clusters.hmm /wdir/fasta/VIRSorter_prots.fasta >> /wdir/log_out 2>> /wdir/log_err

cat: r_0/Contigs_prots_vs_New_clusters.tab: No such file or directory

Step 1.3 : /usr/local/bin/blastall -p blastp -i /wdir/fasta/VIRSorter_prots.fasta -d r_0/db/Pool_new_unclustered -o r_0/Contigs_prots_vs_New_unclustered.tab -a 8 -m 8 -e 0.001 >> /wdir/log_out 2>> /wdir/log_err

Step 2 : /usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl /wdir/fasta/VIRSorter_mga_final.predict Contigs_prots_vs_Phage_Gene_Catalog.tab Contigs_prots_vs_Phage_Gene_unclustered.tab Contigs_prots_vs_PFAMa.tab Contigs_prots_vs_PFAMb.tab /data/Phage_gene_catalog_plus_viromes/Phage_Clusters_current.tab VIRSorter_affi-contigs.csv >> /wdir/log_out 2>> /wdir/log_err

Step 3 : /usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl VIRSorter_affi-contigs.csv VIRSorter_phage-signal.csv >> /wdir/log_out 2>> /wdir/log_err

Setting up the final result file
Step 4 : /usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl VIRSorter_affi-contigs.csv VIRSorter_phage-signal.csv VIRSorter_global-phage-signal.csv VIRSorter_new_prot_list.csv >> /wdir/log_out 2>> /wdir/log_err

Step 5 : /usr/local/bin/Scripts/Step_5_get_phage_fasta-gb.pl VIRSorter >> /wdir/log_out 2>> /wdir/log_err

I appreciate any comment

simroux commented 8 years ago

Hi naarkhoo,

VirSorter work on viral genomes (or assembled contigs), so requires a fasta file of these contigs as input (VirSorter doesn't know how to handle fastq files). I think switching to a fasta file as input should be the first thing to try, and see if there is any other issue, or if this is it.

Let me know if you have any other question

naarkhoo commented 8 years ago

Thanks; infact I have assembled contrigs in fasta format.

my Virsorter_stderr_log looks like

Error: Failed to open sequence file /wdir/fasta/VIRSorter_prots.fasta for reading

Error: Failed to open sequence file /wdir/fasta/VIRSorter_prots.fasta for reading

Error: Failed to open sequence file /wdir/fasta/VIRSorter_prots.fasta for reading

[blastall] FATAL ERROR: blast: Unable to open input file /wdir/fasta/VIRSorter_prots.fasta

Can't open '/wdir/fasta/VIRSorter_circu.list' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl line 39
Can't open 'VIRSorter_affi-contigs.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl line 43
Can't open 'VIRSorter_phage-signal.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl line 83

Am I missing something else ?!

simroux commented 8 years ago

In your initial message, you mentioned that the command line you tried was --fna /Projects//virsorter/LIB1.screened.hg19.screened.mOTU.v1.padded.pair.1.fq.gz. For VirSorter, the file pointed by the argument "--fna" should be a fasta file (not gzip) of assembled contigs, whereas this file looks like a file of reads in fastq format ?

naarkhoo commented 8 years ago

You are right and I appologize for being unclear. Initially it was fastq, but later after reading your response, I used FLASH and made assembled contiges in a fasta format and pass it to virsorter. However, I got the same error (I believe ) (( missing files and folders )), I hope now is more clear

simroux commented 8 years ago

It looks like VirSorter could not predict proteins from your contig files. Could you post here the list of files that were created in the "fasta" directory within the VirSorter output folder ?

naarkhoo commented 8 years ago

The "fasta" directory is within "Fasta_files", which is empty.

and my command looks like the following,

docker run -v /home/rostam/Projects/virsorter/virsorter-data:/data -v /home/rostam/Projects/virsorter/virsorter-run/:/wdir -w /wdir --rm discoenv/virsorter:v1.0.3 --db 2 --fna ~/home/rostam/Projects/virsorter/out.extendedFrags.fna
```bash
simroux commented 8 years ago

So if "fasta" / "Fasta_files" is empty, this means that VirSorter run failed very early on, when it tried to copy your input file "~/home/rostam/Projects/virsorter/out.extendedFrags.fna" into the fasta directory (and clean all sequence name, etc). The first things to check would thus be that this path exists (I am surprised by the fact that it uses both ~ and /home/rostam/, as in my experience, "~" is used instead of /home/username/"), and also that there is enough disk space available and correct permissions for VirSorter to copy these data to /wdir/.

naarkhoo commented 8 years ago

I used /wdir/ instead of spelling the whole path,


docker run -v /home/rostam/Projects/virsorter/virsorter-data:/data -v /home/rostam/Projects/virsorter/virsorter-run/:/wdir -w /wdir --rm discoenv/virsorter:v1.0.3 --db 2 --fna /wdir/out.extendedFrags.fna

and now I can see

under "fasta" directory how can I proceed from here ?

simroux commented 8 years ago

Ok, so it looks better. Did the run already stop ? And if so, could you post the content of log_out and log_err ? Based on the size of the files, the sequences are considered too small by VirSorter (VirSorter needs at least two complete genes (i.e. genes with both a start and stop codon) on a contig to consider it.

naarkhoo commented 8 years ago

yes; it stopped

log_out looks like

/usr/local/bin/Scripts/Step_1_contigs_cleaning_and_gene_prediction.pl VIRSorter /wdir/fasta /wdir/fasta/input_sequences.fna 2
/usr/local/bin/mga_linux_ia64 /wdir/fasta/VIRSorter_nett.fasta -m > /wdir/fasta/VIRSorter_mga.predict
10000-ieme contig
20000-ieme contig
30000-ieme contig
40000-ieme contig
50000-ieme contig
60000-ieme contig
70000-ieme contig
/usr/local/bin/hmmsearch --tblout Contigs_prots_vs_PFAMa.tab --cpu 8 -o Contigs_prots_vs_PFAMa.out --noali /data/PFAM_27/Pfam-A.hmm /wdir/fasta/VIRSorter_prots.fasta
/usr/local/bin/hmmsearch --tblout Contigs_prots_vs_PFAMb.tab --cpu 8 -o Contigs_prots_vs_PFAMb.out --noali /data/PFAM_27/Pfam-B.hmm /wdir/fasta/VIRSorter_prots.fasta
/usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl /wdir/fasta/VIRSorter_mga_final.predict Contigs_prots_vs_Phage_Gene_Catalog.tab Contigs_prots_vs_Phage_Gene_unclustered.tab Contigs_prots_vs_PFAMa.tab Contigs_prots_vs_PFAMb.tab /data/Phage_gene_catalog_plus_viromes/Phage_Clusters_current.tab VIRSorter_affi-contigs.csv
/usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl VIRSorter_affi-contigs.csv VIRSorter_phage-signal.csv
## Taking information from the contig info file (VIRSorter_affi-contigs.csv)
/usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl VIRSorter_affi-contigs.csv VIRSorter_phage-signal.csv VIRSorter_global-phage-signal.csv VIRSorter_new_prot_list.csv
/usr/local/bin/Scripts/Step_5_get_phage_fasta-gb.pl VIRSorter
Code VIRSorter
The sequences will be put in:
 - /wdir/Predicted_viral_sequences/VIRSorter_cat-1.fasta
 - /wdir/Predicted_viral_sequences/VIRSorter_cat-2.fasta
 - /wdir/Predicted_viral_sequences/VIRSorter_cat-3.fasta
 - /wdir/Predicted_viral_sequences/VIRSorter_prophages_cat-4.fasta
 - /wdir/Predicted_viral_sequences/VIRSorter_prophages_cat-5.fasta
 - /wdir/Predicted_viral_sequences/VIRSorter_prophages_cat-6.fasta
Checking '/wdir/VIRSorter_phage-signal.csv'
VIRSorter       in progress

and Virsorter_stderr_log

Error: Sequence file /wdir/fasta/VIRSorter_prots.fasta is empty or misformatted

Error: Sequence file /wdir/fasta/VIRSorter_prots.fasta is empty or misformatted

Can't open 'Contigs_prots_vs_Phage_Gene_unclustered.tab' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl line 79
Can't open 'VIRSorter_affi-contigs.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl line 43
Can't open 'VIRSorter_phage-signal.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl line 83

regarding you second point; I have made contigs using FLASH from illumina shotgun sequencing paired-end reads, If it is short, I can make the FLASH criteria more stringent.

simroux commented 8 years ago

So if your contigs are made with FLASH, that means that at best the contigs will be as long as twice your read length, and VirSorter likely fails because there are no sequences long enough for it to process. Unfortunately, VirSorter needs actual genome fragments to identify viral regions / contigs, so we advise to only work with sequences larger than 1.5kb, ideally should be 10kb or more.