Closed naarkhoo closed 7 years ago
Hi naarkhoo,
VirSorter work on viral genomes (or assembled contigs), so requires a fasta file of these contigs as input (VirSorter doesn't know how to handle fastq files). I think switching to a fasta file as input should be the first thing to try, and see if there is any other issue, or if this is it.
Let me know if you have any other question
Thanks; infact I have assembled contrigs in fasta format.
my Virsorter_stderr_log looks like
Error: Failed to open sequence file /wdir/fasta/VIRSorter_prots.fasta for reading
Error: Failed to open sequence file /wdir/fasta/VIRSorter_prots.fasta for reading
Error: Failed to open sequence file /wdir/fasta/VIRSorter_prots.fasta for reading
[blastall] FATAL ERROR: blast: Unable to open input file /wdir/fasta/VIRSorter_prots.fasta
Can't open '/wdir/fasta/VIRSorter_circu.list' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl line 39
Can't open 'VIRSorter_affi-contigs.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl line 43
Can't open 'VIRSorter_phage-signal.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl line 83
Am I missing something else ?!
In your initial message, you mentioned that the command line you tried was
--fna /Projects//virsorter/LIB1.screened.hg19.screened.mOTU.v1.padded.pair.1.fq.gz
.
For VirSorter, the file pointed by the argument "--fna" should be a fasta file (not gzip) of assembled contigs, whereas this file looks like a file of reads in fastq format ?
You are right and I appologize for being unclear. Initially it was fastq, but later after reading your response, I used FLASH and made assembled contiges in a fasta format and pass it to virsorter. However, I got the same error (I believe ) (( missing files and folders )), I hope now is more clear
It looks like VirSorter could not predict proteins from your contig files. Could you post here the list of files that were created in the "fasta" directory within the VirSorter output folder ?
The "fasta" directory is within "Fasta_files", which is empty.
and my command looks like the following,
docker run -v /home/rostam/Projects/virsorter/virsorter-data:/data -v /home/rostam/Projects/virsorter/virsorter-run/:/wdir -w /wdir --rm discoenv/virsorter:v1.0.3 --db 2 --fna ~/home/rostam/Projects/virsorter/out.extendedFrags.fna
```bash
So if "fasta" / "Fasta_files" is empty, this means that VirSorter run failed very early on, when it tried to copy your input file "~/home/rostam/Projects/virsorter/out.extendedFrags.fna" into the fasta directory (and clean all sequence name, etc). The first things to check would thus be that this path exists (I am surprised by the fact that it uses both ~ and /home/rostam/, as in my experience, "~" is used instead of /home/username/"), and also that there is enough disk space available and correct permissions for VirSorter to copy these data to /wdir/.
I used /wdir/ instead of spelling the whole path,
docker run -v /home/rostam/Projects/virsorter/virsorter-data:/data -v /home/rostam/Projects/virsorter/virsorter-run/:/wdir -w /wdir --rm discoenv/virsorter:v1.0.3 --db 2 --fna /wdir/out.extendedFrags.fna
and now I can see
under "fasta" directory how can I proceed from here ?
Ok, so it looks better. Did the run already stop ? And if so, could you post the content of log_out and log_err ? Based on the size of the files, the sequences are considered too small by VirSorter (VirSorter needs at least two complete genes (i.e. genes with both a start and stop codon) on a contig to consider it.
yes; it stopped
log_out looks like
/usr/local/bin/Scripts/Step_1_contigs_cleaning_and_gene_prediction.pl VIRSorter /wdir/fasta /wdir/fasta/input_sequences.fna 2
/usr/local/bin/mga_linux_ia64 /wdir/fasta/VIRSorter_nett.fasta -m > /wdir/fasta/VIRSorter_mga.predict
10000-ieme contig
20000-ieme contig
30000-ieme contig
40000-ieme contig
50000-ieme contig
60000-ieme contig
70000-ieme contig
/usr/local/bin/hmmsearch --tblout Contigs_prots_vs_PFAMa.tab --cpu 8 -o Contigs_prots_vs_PFAMa.out --noali /data/PFAM_27/Pfam-A.hmm /wdir/fasta/VIRSorter_prots.fasta
/usr/local/bin/hmmsearch --tblout Contigs_prots_vs_PFAMb.tab --cpu 8 -o Contigs_prots_vs_PFAMb.out --noali /data/PFAM_27/Pfam-B.hmm /wdir/fasta/VIRSorter_prots.fasta
/usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl /wdir/fasta/VIRSorter_mga_final.predict Contigs_prots_vs_Phage_Gene_Catalog.tab Contigs_prots_vs_Phage_Gene_unclustered.tab Contigs_prots_vs_PFAMa.tab Contigs_prots_vs_PFAMb.tab /data/Phage_gene_catalog_plus_viromes/Phage_Clusters_current.tab VIRSorter_affi-contigs.csv
/usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl VIRSorter_affi-contigs.csv VIRSorter_phage-signal.csv
## Taking information from the contig info file (VIRSorter_affi-contigs.csv)
/usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl VIRSorter_affi-contigs.csv VIRSorter_phage-signal.csv VIRSorter_global-phage-signal.csv VIRSorter_new_prot_list.csv
/usr/local/bin/Scripts/Step_5_get_phage_fasta-gb.pl VIRSorter
Code VIRSorter
The sequences will be put in:
- /wdir/Predicted_viral_sequences/VIRSorter_cat-1.fasta
- /wdir/Predicted_viral_sequences/VIRSorter_cat-2.fasta
- /wdir/Predicted_viral_sequences/VIRSorter_cat-3.fasta
- /wdir/Predicted_viral_sequences/VIRSorter_prophages_cat-4.fasta
- /wdir/Predicted_viral_sequences/VIRSorter_prophages_cat-5.fasta
- /wdir/Predicted_viral_sequences/VIRSorter_prophages_cat-6.fasta
Checking '/wdir/VIRSorter_phage-signal.csv'
VIRSorter in progress
and Virsorter_stderr_log
Error: Sequence file /wdir/fasta/VIRSorter_prots.fasta is empty or misformatted
Error: Sequence file /wdir/fasta/VIRSorter_prots.fasta is empty or misformatted
Can't open 'Contigs_prots_vs_Phage_Gene_unclustered.tab' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl line 79
Can't open 'VIRSorter_affi-contigs.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl line 43
Can't open 'VIRSorter_phage-signal.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl line 83
regarding you second point; I have made contigs using FLASH from illumina shotgun sequencing paired-end reads, If it is short, I can make the FLASH criteria more stringent.
So if your contigs are made with FLASH, that means that at best the contigs will be as long as twice your read length, and VirSorter likely fails because there are no sequences long enough for it to process. Unfortunately, VirSorter needs actual genome fragments to identify viral regions / contigs, so we advise to only work with sequences larger than 1.5kb, ideally should be 10kb or more.
I am trying to run Virsorter on my machine; I am getting a bunch of errors, which I believe is because of incomplete installation. I appreciate to hear your comments.
So basically, my virsorter directory includes
directory : virsorter-data/(downloaded from iplant) directory : virsorter-run/(I make it by mkdir) files : 2 fastq files (1.fq.gz and 2.fq.gz)
and then I run
the output looks like
I appreciate any comment