linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
140 stars 40 forks source link

meta vs. prok #25

Closed lis4matilda closed 4 years ago

lis4matilda commented 4 years ago

Hi again I aim to annotate assembly metagenomic genes/contigs for CAZymes. I cannot find any documentation about what the differences are running with the option prok vs meta. When I run with meta I get an error message saying that I "missed some parameters for input" but with the prok as option it works fine. What is different when running with "meta" as option and which imput could I have missed? When I am running the same genes in on the server I get results setting meta as option. I am running: run_dbcan.py genes.K01193.cut1.fna meta --use_signalP=True

linnabrown commented 4 years ago

replace the /home/lisao/.local/bin/run_dbcan.py with new run_dbcan.py in this repo (https://github.com/linnabrown/run_dbcan/blob/master/run_dbcan.py)

linnabrown commented 4 years ago

This problem has been solved in the new package. Please use this command

pip install run-dbcan==2.0.1 --user
ppericard commented 4 years ago

Hi,

I still have the same pb with the latest version 2.0.2. Can you help me ?

$ run_dbcan.py --db_dir dbcan2 EscheriaColiK12MG1655.fna meta --out_dir output_EscheriaColiK12MG1655
ERROR: You missed some parameters for input
USAGE: ./FragGeneScan -s [seq_file_name] -o [output_file_name] -w [1 or 0] -t [train_file_name] (-p [thread_num])

       Mandatory parameters
       [seq_file_name]:    sequence file name including the full path
       [output_file_name]: output file name including the full path
       [1 or 0]:           1 if the sequence file has complete genomic sequences
                           0 if the sequence file has short sequence reads
       [train_file_name]:  file name that contains model parameters; this file should be in the "train" directory
                           Note that four files containing model parameters already exist in the "train" directory
                           [complete] for complete genomic sequences or short sequence reads without sequencing error
                           [sanger_5] for Sanger sequencing reads with about 0.5% error rate
                           [sanger_10] for Sanger sequencing reads with about 1% error rate
                           [454_5] for 454 pyrosequencing reads with about 0.5% error rate
                           [454_10] for 454 pyrosequencing reads with about 1% error rate
                           [454_30] for 454 pyrosequencing reads with about 3% error rate
                           [illumina_5] for Illumina sequencing reads with about 0.5% error rate
                           [illumina_10] for Illumina sequencing reads with about 1% error rate

       Optional parameter
       [thread_num]:       the number of threads used by FragGeneScan; default is 1 thread.
cp: cannot stat 'output_EscheriaColiK12MG1655/fragGeneScan.faa': No such file or directory
***************************1. DIAMOND start*************************************************

***************************2. HMMER start*************************************************

diamond v0.9.29.130 | by Benjamin Buchfink <buchfink@gmail.com>
Licensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt>
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 2
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
/bin/sh: 1: cannot open output_EscheriaColiK12MG1655/uniInput: No such file

Error: Failed to open sequence file output_EscheriaColiK12MG1655/uniInput for reading

Temporary directory: output_EscheriaColiK12MG1655
Opening the database... Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/dbcan/bin/run_dbcan.py", line 176, in <module>
 [4.5e-05s]
#Target sequences to report alignments for: 1
Opening the input file... No such file or directory
 [0.000112s]
Error: Error opening file output_EscheriaColiK12MG1655/uniInput
    count_per_file = count / numThreads                                                #number of genes per core
ZeroDivisionError: division by zero
linnabrown commented 4 years ago

Try this command

run_dbcan.py EscheriaColiK12MG1655.fna meta --db_dir dbcan2  --out_dir output_EscheriaColiK12MG1655
ppericard commented 4 years ago

Nope. Not working either. To be on the safe side, I made a fresh install of run_dbcan v2.0.3, its dependencies, and the db files. The prok mode works fine, but the meta mode crashes, with the test file but also with my meta-transcriptomics assembly file.

(dbcan) ubuntu@ppericard-sinfoni-bigmem:~/data/test_run_dbcan$ run_dbcan.py EscheriaColiK12MG1655.fna meta --out_dir output_EscheriaColiK12MG1655_meta
ERROR: You missed some parameters for input
USAGE: ./FragGeneScan -s [seq_file_name] -o [output_file_name] -w [1 or 0] -t [train_file_name] (-p [thread_num])

       Mandatory parameters
       [seq_file_name]:    sequence file name including the full path
       [output_file_name]: output file name including the full path
       [1 or 0]:           1 if the sequence file has complete genomic sequences
                           0 if the sequence file has short sequence reads
       [train_file_name]:  file name that contains model parameters; this file should be in the "train" directory
                           Note that four files containing model parameters already exist in the "train" directory
                           [complete] for complete genomic sequences or short sequence reads without sequencing error
                           [sanger_5] for Sanger sequencing reads with about 0.5% error rate
                           [sanger_10] for Sanger sequencing reads with about 1% error rate
                           [454_5] for 454 pyrosequencing reads with about 0.5% error rate
                           [454_10] for 454 pyrosequencing reads with about 1% error rate
                           [454_30] for 454 pyrosequencing reads with about 3% error rate
                           [illumina_5] for Illumina sequencing reads with about 0.5% error rate
                           [illumina_10] for Illumina sequencing reads with about 1% error rate

       Optional parameter
       [thread_num]:       the number of threads used by FragGeneScan; default is 1 thread.
cp: cannot stat 'output_EscheriaColiK12MG1655_meta/fragGeneScan.faa': No such file or directory
***************************1. DIAMOND start*************************************************

***************************2. HMMER start*************************************************

Error: Failed to open sequence file output_EscheriaColiK12MG1655_meta/uniInput for reading

/bin/sh: 1: cannot open output_EscheriaColiK12MG1655_meta/uniInput: No such file
diamond v0.9.29.130 | by Benjamin Buchfink <buchfink@gmail.com>
Licensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt>
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 2
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: output_EscheriaColiK12MG1655_meta
Opening the database...  [5.5e-05s]
#Target sequences to report alignments for: 1
Opening the input file... No such file or directory
 [0.000148s]
Error: Error opening file output_EscheriaColiK12MG1655_meta/uniInput
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/bin/run_dbcan.py", line 173, in <module>
    count_per_file = count / numThreads                                                #number of genes per core
ZeroDivisionError: division by zero
(dbcan) ubuntu@ppericard-sinfoni-bigmem:~/data/test_run_dbcan$ run_dbcan.py EscheriaColiK12MG1655.fna prok --out_dir output_EscheriaColiK12MG1655
***************************1. DIAMOND start*************************************************

***************************2. HMMER start*************************************************

diamond v0.9.29.130 | by Benjamin Buchfink <buchfink@gmail.com>
Licensed under the GNU GPL <https://www.gnu.org/licenses/gpl.txt>
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 2
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: output_EscheriaColiK12MG1655
Opening the database...  [8.3e-05s]
#Target sequences to report alignments for: 1
Opening the input file...  [5.2e-05s]
Opening the output file...  [4.1e-05s]
Loading query sequences...  [0.021548s]
Masking queries... 
***************************3. HotPep start***************************************************

 [0.121294s]
Building query seed set...  [0.012264s]
Algorithm: Double-indexed
Building query histograms...  [0.028366s]
Allocating buffers...  [5.3e-05s]
Loading reference sequences... Screening EscheriaColiK12MG1655 for
CE
Assigning proteins to groups
 [2.89661s]
Masking reference... Collecting Results
GH
Assigning proteins to groups
Collecting Results
AA
Assigning proteins to groups
Collecting Results
PL
Assigning proteins to groups
Collecting Results
GT
Assigning proteins to groups
 [28.3597s]
Initializing temporary storage...  [0.000394s]
Building reference histograms... Collecting Results
CBM
Assigning proteins to groups
 [5.73527s]
Allocating buffers...  [6.7e-05s]
Processing query block 0, reference block 0, shape 0, index chunk 0.
Building reference seed array... Collecting Results

Screened
EscheriaColiK12MG1655
for proteins of the types
CE, GH, AA, PL, GT, CBM
 [5.17435s]
Building query seed array...  [0.029714s]
Computing hash join...  [0.827719s]
Building seed filter...  [0.005292s]
Searching alignments...  [2.15502s]
Processing query block 0, reference block 0, shape 0, index chunk 1.
Building reference seed array...  [4.86135s]
Building query seed array...  [0.030344s]
Computing hash join...  [0.883778s]
Building seed filter...  [0.004309s]
Searching alignments...  [1.9289s]
Processing query block 0, reference block 0, shape 0, index chunk 2.
Building reference seed array...  [5.29585s]
Building query seed array...  [0.032264s]
Computing hash join...  [0.863779s]
Building seed filter...  [0.006037s]
Searching alignments...  [1.82135s]
Processing query block 0, reference block 0, shape 0, index chunk 3.
Building reference seed array...  [3.73824s]
Building query seed array...  [0.023529s]
Computing hash join...  [0.828141s]
Building seed filter...  [0.005055s]
Searching alignments...  [1.80458s]
Processing query block 0, reference block 0, shape 1, index chunk 0.
Building reference seed array...  [3.7038s]
Building query seed array...  [0.02374s]
Computing hash join...  [0.80572s]
Building seed filter...  [0.004569s]
Searching alignments...  [1.67875s]
Processing query block 0, reference block 0, shape 1, index chunk 1.
Building reference seed array...  [4.99585s]
Building query seed array...  [0.030358s]
Computing hash join...  [0.853903s]
Building seed filter...  [0.004627s]
Searching alignments...  [1.68618s]
Processing query block 0, reference block 0, shape 1, index chunk 2.
Building reference seed array...  [5.11981s]
Building query seed array...  [0.032199s]
Computing hash join...  [0.830349s]
Building seed filter...  [0.004533s]
Searching alignments...  [1.67546s]
Processing query block 0, reference block 0, shape 1, index chunk 3.
Building reference seed array...  [3.73317s]
Building query seed array...  [0.021848s]
Computing hash join...  [0.853786s]
Building seed filter...  [0.004714s]
Searching alignments...  [1.73002s]
Deallocating buffers...  [0.130376s]
Computing alignments...  [4.95889s]
Deallocating reference...  [0.041937s]
Loading reference sequences...  [3.8e-05s]
Deallocating buffers...  [0.000308s]
Deallocating queries...  [0.000238s]
Loading query sequences...  [3e-05s]
Closing the input file...  [2.2e-05s]
Closing the output file...  [0.000116s]
Closing the database file...  [2e-05s]
Deallocating taxonomy...  [1.6e-05s]
Total time = 100.436s
Reported 135 pairwise alignments, 135 HSPs.
135 queries aligned.
***************************1. DIAMOND end***************************************************
***************************2. HMMER end***************************************************
***************************3. hotPep end***************************************************
Preparing overview table from hmmer, hotpep and diamond output...
overview table complete. Saved as output_EscheriaColiK12MG1655/overview.txt
tpriest0 commented 4 years ago

I am having exactly the same problem using run_dbcan.py version 2.0.3 - everything works fine when running 'prok' but 'meta' always fails saying. The problem originates from fraggescan not be able to work, resulting in no uniInput file being produced

linnabrown commented 4 years ago

@tpriest0 @ppericard Sorry for my late response because I participated in the Ph.D. interview recently. The fraggenescan does not work now. So we will use prodigal tool to predict the genes from metagenomes instead. Thank you for your feedback!

pip install run-dbcan==2.0.5 --user
linnabrown commented 4 years ago

This problem is totally solved by the version of 2.0.5. So I will close this thread now. Thank you for your attention.

pavlohrab commented 2 years ago

I have a similar problem with the latest docker version:

docker run -it haidyi/run_dbcan:latest run_dbcan protein.faa protein --out_dir test_res_run_dbcan
cp: cannot stat 'protein..faa': No such file or directory

The rest of the error output is the same. The file is in the folder from where i am trying to run the image