medvir / VirMet

Set of tools for viral metagenomics.
14 stars 5 forks source link

wolfpack #31

Closed LCarioti closed 6 years ago

LCarioti commented 6 years ago

Hi,

virmet wolfpack --run /home/Exp01
Traceback (most recent call last):
  File "/home/miniconda2/envs/virmet1/bin/virmet", line 11, in <module>
    load_entry_point('VirMet==1.1.1', 'console_scripts', 'virmet')()
  File "/home/miniconda2/envs/virmet1/lib/python3.5/site-packages/virmet/__main__.py", line 120, in main
    args.func(args)
  File "/home/miniconda2/envs/virmet1/lib/python3.5/site-packages/virmet/__main__.py", line 25, in wolfpack_run
    od = wolfpack.main(args)
  File "/home/miniconda2/envs/virmet1/lib/python3.5/site-packages/virmet/wolfpack.py", line 410, in main
    sd = hunter(fq)
  File "/home/miniconda2/envs/virmet1/lib/python3.5/site-packages/virmet/wolfpack.py", line 126, in hunter 
    with open('prinseq.log') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'prinseq.log'

virmet.log
INFO 2017/12/01 14:22:40 __main__.py: main() 116:       /home/miniconda2/envs/virmet1/bin/virmet wolfpack --run /home/Exp01
INFO 2017/12/01 14:22:40 wolfpack.py: main() 391:       samples to run: S16 S7
INFO 2017/12/01 14:22:40 wolfpack.py: main() 409:       running hunter on /home/Exp01/A-16_S16_L001_R1_001.fastq.gz
DEBUG 2017/12/01 14:22:40 wolfpack.py: hunter() 47:     hunter will run on 16 processors
INFO 2017/12/01 14:22:40 common.py: run_child() 56:     Running instance of gunzip
DEBUG 2017/12/01 14:22:45 common.py: run_child() 62:    Completed
DEBUG 2017/12/01 14:22:45 wolfpack.py: hunter() 77:     trimming with seqtk
INFO 2017/12/01 14:22:45 common.py: run_child() 56:     Running instance of seqtk
DEBUG 2017/12/01 14:22:50 common.py: run_child() 62:    Completed
INFO 2017/12/01 14:22:50 common.py: run_child() 56:     Running instance of wc
DEBUG 2017/12/01 14:22:50 common.py: run_child() 62:    Completed
INFO 2017/12/01 14:22:50 common.py: run_child() 56:     Running instance of split
DEBUG 2017/12/01 14:22:52 common.py: run_child() 62:    Completed
DEBUG 2017/12/01 14:22:52 wolfpack.py: hunter() 101:    filtering with prinseq
INFO 2017/12/01 14:22:52 common.py: run_child() 56:     Running instance of /usr/bin/seq
ERROR 2017/12/01 14:22:52 common.py: run_child() 64:    Execution of /usr/bin/seq -f %03g 0 15 | xargs -P 16 -I {} prinseq             -fastq splitted{}.fastq -lc_method entropy -lc_threshold 70             -log prinseq{}.log -min_qual_mean 20             -out_good ./good{} -out_bad ./bad{} > ./prinseq.err 2>&1 failed with returncode 127:
ERROR 2017/12/01 14:22:52 common.py: run_child() 65:    /usr/bin/seq -f %03g 0 15 | xargs -P 16 -I {} prinseq             -fastq splitted{}.fastq -lc_method entropy -lc_threshold 70             -log prinseq{}.log -min_qual_mean 20             -out_good ./good{} -out_bad ./bad{} > ./prinseq.err 2>&1
DEBUG 2017/12/01 14:22:52 wolfpack.py: hunter() 108:    cleaning up
INFO 2017/12/01 14:22:52 common.py: run_child() 56:     Running instance of rm
DEBUG 2017/12/01 14:22:53 common.py: run_child() 62:    Completed
ozagordi commented 6 years ago

How did you install virmet? It seems that prinseq is not found in your PATH.

LCarioti commented 6 years ago

conda create -n virmet1 python=3.5 virmet -c bioconda

LCarioti commented 6 years ago

Ubuntu 16.04 LTS 24 Intel(R) Xeon(R) RAM 50 GB

ozagordi commented 6 years ago

cat ./prinseq.err?

LCarioti commented 6 years ago

cat prinseq.err xargs: prinseq: No such file or directory

I tried to fix with a comment line at line 37 of wolfpak

#    from virmet.common import prinseq_exe
prinseq_exe = 'prinseq-lite.pl'
#    prinseq_exe = 'prinseq'

and then it run.

When it ends

wc -l unique.tsv 667498 unique.tsv

wc -l orgs_list.tsv 1 orgs_list.tsv

$ head -n 2 unique.tsv qseqid sseqid sscinames stitle pident qcovs score length mismatch gapopen qstart qend sstart send staxids M00611 gb|KY555145.1| N/A Caulobacter phage Ccr29, complete genome 91.667 39 44 60 3 2 38 96 150536 150478 1959737

I fixed with a simple query between unique.tsv and viral_seqs_info.tsv

Is it correct?

If I run blastn

blastn -task megablast -query input_file.fa -db /data/virmet_databases/viral_nuccore/viral_db -out final.contigs.txt -outfmt "6 qseqid sseqid sscinames stitle pident qcovs score length mismatch gapopen qstart qend sstart send staxids"

Warning: [blastn] Taxonomy name lookup from taxid requires installation of taxdb database with ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz

~

ozagordi commented 6 years ago

What sample are you trying to analyze?

If you want to run blast by hand, you need to set the environment variable BLASTDB to the directory where blast can find taxdb files (they should be in /data/virmet_databases/). The equivalent of the python line os.environ['BLASTDB'] = DB_DIR.

LCarioti commented 6 years ago

I made by megahit (Li, D Bioinformatics 2015) a de novo assembly with the file viral_reads.fastq and then i assigned the taxonomy by blast using the viral DB of virmet (/data/virmet_databases/).

I don't know why but i can't assing sscinames. The same issue of unique.tsv

head unique.tsv qseqid sseqid sscinames stitle pident qcovs score length mismatch gapopen qstart qend sstart send staxids M00611:16081:N:0:1 gb|KY555145.1| N/A Caulobacter phage Ccr29, complete genome 91.667 39 44 60 3 2 38 96 150536 150478 1959737 M00611:21291:N:0:1 gb|KY094066.1| N/A BeAn 58058 virus, complete genome 85.841 75 65 113 16 0 7 119 8478 8590 67082 M00611:21301:N:0:1 gb|EF380009.1| N/A Enterobacteria phage phiX174 isolate AP100, complete genome 99.020 68 99 102 1 0 1 102 1638 1537 10847 M00611:21401:N:0:1 gb|EF380025.1| N/A Enterobacteria phage phiX174 isolate 10D90, complete genome 92.481 89 103 133 10 0 1 133 401 533 10847 M00611:21401:N:0:1 gb|EF380009.1| N/A Enterobacteria phage phiX174 isolate AP100, complete genome 91.667 95 107 144 10 2 1 143 4437 4295 10847 M00611:21521:N:0:1 gb|AY037928.1| N/A Human endogenous retrovirus K113 complete genome 87.770 92 88 139 17 0 1 139 2343 2481 166122 M00611:21671:N:0:1 gb|J02482.1| N/A Coliphage phi-X174, complete genome 96.000 100 132 150 6 0 1 150 413 264 10847 M00611:21801:N:0:1 gb|J02482.1| N/A Coliphage phi-X174, complete genome 92.715 100 118 151 11 0 1 151 1874 1724 10847 M00611:21811:N:0:1 gb|J02482.1| N/A Coliphage phi-X174, complete genome 95.364 99 129 151 5 2 1 150 4584 4435 10847

ozagordi commented 6 years ago

Do you have taxdb.btd and taxdb.bti in /data/virmet_databases?

host:~ l -1 /data/virmet_databases
total 107M
-rw-r--r-- 1 ozagordi ngs  11M Apr 10  2017 taxdb.bti
-rw-r--r-- 1 ozagordi ngs  96M Apr 10  2017 taxdb.btd
drwxr-xr-x 4 ozagordi ngs 4.0K Apr 18  2017 human/
drwxr-xr-x 4 ozagordi ngs 4.0K Apr 18  2017 bacteria/
drwxr-xr-x 4 ozagordi ngs 4.0K Apr 18  2017 fungi/
drwxr-xr-x 4 ozagordi ngs 4.0K Apr 18  2017 bovine/
drwxr-xr-x 3 ozagordi ngs 4.0K Apr 18  2017 viral_nuccore/
LCarioti commented 6 years ago

No

/data/virmet_databases/viral_nuccore$ ls -1trhc ncbi_search viral_seqs_info.tsv viral_accn_taxid.dmp viral_database.fasta blast.perf blast.log viral_db.nsq viral_db.nsi viral_db.nsd viral_db.nog viral_db.nin viral_db.nhr viral_db.nhi viral_db.nhd

I have built my DB by

virmet fetch --viral n

virmet index --viral n

ozagordi commented 6 years ago

I will investigate. In the meanwhile, can you try to download them from ncbi, set the environment variable and run blast again?

Written on a touch screen. Please excuse any typos.

On Dec 12, 2017, at 18:29, staltor notifications@github.com wrote:

No

/data/virmet_databases/viral_nuccore$ ls -1trhc ncbi_search viral_seqs_info.tsv viral_accn_taxid.dmp viral_database.fasta blast.perf blast.log viral_db.nsq viral_db.nsi viral_db.nsd viral_db.nog viral_db.nin viral_db.nhr viral_db.nhi viral_db.nhd

I have built my DB by

virmet fetch --viral n

virmet index --viral n

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

LCarioti commented 6 years ago

thanks

I will try it

LCarioti commented 6 years ago

thanks

everything runs well