typhainepl commented 1 year ago

Hi,

I'm encountering an error while trying to run dbCAN, and it appears to be related to the output generation. Any assistance you could provide would be greatly appreciated.

I've installed dbcan through conda.

The command I am running is the following:run_dbcan test_seq.faa protein --out_dir test_2

The output and error message:

***************************1. DIAMOND start*************************************************

diamond v2.1.8.162 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 4
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: test_2
#Target sequences to report alignments for: 1
Opening the database...  [0.08s]
Database: db/CAZy (type: Diamond database, sequences: 2428817, letters: 1157024505)
Block size = 2000000000
Building query seed set...  [0s]
Algorithm: Query-indexed
Building query histograms...  [0s]
Seeking in database...  [0s]
Loading reference sequences...  [3.2s]
Initializing temporary storage...  [0.014s]
Building reference histograms...  [6.609s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/2.
Building reference seed array...  [2.74s]
Building query seed array...  [0s]
Computing hash join...  [0.058s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2.
Building reference seed array...  [4.652s]
Building query seed array...  [0s]
Computing hash join...  [0.515s]
Searching alignments...  [0.001s]
Deallocating memory...  [0s]
Deallocating buffers...  [0.009s]
Clearing query masking...  [0s]
Computing alignments... Loading trace points...  [0.004s]
Sorting trace points...  [0s]
Computing alignments...  [0.001s]
Deallocating buffers...  [0s]
Loading trace points...  [0s]
 [0.007s]
Deallocating reference...  [0.007s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0s]
Total time = 17.908s
Reported 0 pairwise alignments, 0 HSPs.
0 queries aligned.

***************************1. DIAMOND end***************************************************

***************************2. HMMER start*************************************************

***************************2. HMMER end***************************************************

***************************3. dbCAN_sub start***************************************************

ID count: 8
total time: 5.017667531967163

***************************3. dbCAN_sub end***************************************************

Traceback (most recent call last):
  File "/homes/typhaine/miniconda3/envs/run_dbcan/bin/run_dbcan", line 10, in <module>
    sys.exit(cli_main())
  File "/homes/typhaine/miniconda3/envs/run_dbcan/lib/python3.8/site-packages/dbcan_cli/run_dbcan.py", line 883, in cli_main
    run(inputFile=args.inputFile, inputType=args.inputType, cluster=args.cluster, dbCANFile=args.dbCANFile,
  File "/homes/typhaine/miniconda3/envs/run_dbcan/lib/python3.8/site-packages/dbcan_cli/run_dbcan.py", line 290, in run
    with open(f"{outPath}dbsub.out") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'test_2/dbsub.out'

My input file (test_seq.faa) looks like this:

>0A023FBW4 E1142_AMBCJ
MTSHGAVKIAIFAVIALHSIFECLSKPQILQRTDHSTDSDWDPQMCPETCNPSKNISCSSECLCVTLGGGDETGTCFNMSGVDWLGHAQASDGHNDG
>A0A023FF81 E1126_AMBCJ
MTSHSAVRIAIFAVIALHSIFECLSKPQILQRTDKSTDSEWDPQTCPETCIPSKNITCSDGCVCVKLGEEEEGTCFNMTGVDWLGSPSDD
>A0A023PXA5 YA19A_YEAST
MLLSELVATASSLPYTAISIHNNCRVPAARHIHHGCRYFHGPPVMHLPQCLRTIQFSPSVISTSYQIPVICQHHAVVPTARYLPDYCSIISWHRPLWGIHILIVPQSQLPLPIRPKRIHTTHRYKPVIAFNDHIPSLALWICLHYQGSNGCVTPVAAKFFIIFHFVGLKEIMSPSRNATRNLNQYWRVL
>A0A023PXB5 IRC2_YEAST
MFALIISSKGKTSGFFFNSSFSSSALVGIAPLTAYSALVTPVFKSFLVILPAGLKSKSFAVNTPFKSCWCVIVMCSYFFCVYHLQKQHYCGAPSLYSYLLCL
>A0A023PXC2 YE53A_YEAST
MLPLCLTFLSFFLSLGGSFKAVMTKEEADGTTEAAACLFWIFNWTVTLIPLNSLVALAISSPTFFGDRPKGPIFGAKAAEAPTSPPTALRYKYLTSLGSNFGGIFVYPLFLLSTF
>A0A023PXD3 YE88A_YEAST
MTRLPPIPRMTVTLTTRPAVPTCNEGSSILHYIYIPIYEPNEQKEKRRRKTPPEPRAYTTTTTIATNSRISGCSLTLEDGIHLRGKRAETARLPAATPQKRTGPARG
>A0A023PXD5 YE147_YEAST
MMTAAKRLGLYSALRACSATVFRSNLHPKVTVATMFCSVGTIPDVAEVSFSDSGAALFMSSSLWKVVAGFVPSRFWFSHTCLVFGSNTILFASLNSFKRSSSAIIKKVSLDTPVYVGLEKKNKMQPLLPCFFRRAV
>A0A023PXE5 YH006_YEAST
MDLYPPASWAALVPFCKALTFKVPVVLGNRNPSPPSPLPPMALSLSLLIPLSRLSLSGSSDTADGSLLISCISRGSCGIFRMGCEAVKGRSLGCLLPRSNCTYGCMSLRKYVSVCSM

Best,

Typhaine

yinlabniu commented 1 year ago

please use fasta format for the seqs (i.e., put '>' in front of seq IDs).

0A023FBW4 E1142_AMBCJ MTSHGAVKIAIFAVIALHSIFECLSKPQILQRTDHSTDSDWDPQMCPETCNPSKNISCSSECLCVTLGGGDETGTCFNMSGVDWLGHAQASDGHNDG

From: typhainepl @.> Sent: Thursday, September 7, 2023 10:25 AM To: linnabrown/run_dbcan @.> Cc: Subscribed @.***> Subject: [linnabrown/run_dbcan] dbCAN-sub error (Issue #129)

Non-NU Email

Hi,

I'm encountering an error while trying to run dbCAN, and it appears to be related to the output generation. Any assistance you could provide would be greatly appreciated.

I've installed dbcan through conda.

The command I am running is the following:run_dbcan test_seq.faa protein --out_dir test_2

The output and error message:

`

DIAMOND start**

diamond v2.1.8.162 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen Documentation, support and updates available at http://www.diamondsearch.org https://urldefense.com/v3/__http://www.diamondsearch.org__;!!PvXuogZ4sRB2p-tU!DHpGHlXIHmgaG9JfVh6CGdDdOppnatHSkK2k6JAbxKAYUVdfA1pvwVgAdLAsgQ-e-w9r--rgl_xUkCOe0VqH4g$ Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x https://urldefense.com/v3/__http://dx.doi.org/10.1038/s41592-021-01101-x__;!!PvXuogZ4sRB2p-tU!DHpGHlXIHmgaG9JfVh6CGdDdOppnatHSkK2k6JAbxKAYUVdfA1pvwVgAdLAsgQ-e-w9r--rgl_xUkCOsnQp-Tw$ Nature Methods (2021)

CPU threads: 4

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: test_2

Target sequences to report alignments for: 1

Opening the database... [0.08s] Database: db/CAZy (type: Diamond database, sequences: 2428817, letters: 1157024505) Block size = 2000000000 Building query seed set... [0s] Algorithm: Query-indexed Building query histograms... [0s] Seeking in database... [0s] Loading reference sequences... [3.2s] Initializing temporary storage... [0.014s] Building reference histograms... [6.609s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2. Building reference seed array... [2.74s] Building query seed array... [0s] Computing hash join... [0.058s] Searching alignments... [0s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2. Building reference seed array... [4.652s] Building query seed array... [0s] Computing hash join... [0.515s] Searching alignments... [0.001s] Deallocating memory... [0s] Deallocating buffers... [0.009s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.004s] Sorting trace points... [0s] Computing alignments... [0.001s] Deallocating buffers... [0s] Loading trace points... [0s] [0.007s] Deallocating reference... [0.007s] Loading reference sequences... [0s] Deallocating buffers... [0s] Deallocating queries... [0s] Total time = 17.908s Reported 0 pairwise alignments, 0 HSPs. 0 queries aligned.

DIAMOND end****
HMMER start**
HMMER end****
dbCAN_sub start****

ID count: 8 total time: 5.017667531967163

dbCAN_sub end****

Traceback (most recent call last): File "/homes/typhaine/miniconda3/envs/run_dbcan/bin/run_dbcan", line 10, in sys.exit(cli_main()) File "/homes/typhaine/miniconda3/envs/run_dbcan/lib/python3.8/site-packages/dbcan_cli/run_dbcan.py", line 883, in cli_main run(inputFile=args.inputFile, inputType=args.inputType, cluster=args.cluster, dbCANFile=args.dbCANFile, File "/homes/typhaine/miniconda3/envs/run_dbcan/lib/python3.8/site-packages/dbcan_cli/run_dbcan.py", line 290, in run with open(f"{outPath}dbsub.out") as f: FileNotFoundError: [Errno 2] No such file or directory: 'test_2/dbsub.out' `

My input file (test_seq.faa) looks like this:

`

0A023FBW4 E1142_AMBCJ MTSHGAVKIAIFAVIALHSIFECLSKPQILQRTDHSTDSDWDPQMCPETCNPSKNISCSSECLCVTLGGGDETGTCFNMSGVDWLGHAQASDGHNDG A0A023FF81 E1126_AMBCJ MTSHSAVRIAIFAVIALHSIFECLSKPQILQRTDKSTDSEWDPQTCPETCIPSKNITCSDGCVCVKLGEEEEGTCFNMTGVDWLGSPSDD A0A023PXA5 YA19A_YEAST MLLSELVATASSLPYTAISIHNNCRVPAARHIHHGCRYFHGPPVMHLPQCLRTIQFSPSVISTSYQIPVICQHHAVVPTARYLPDYCSIISWHRPLWGIHILIVPQSQLPLPIRPKRIHTTHRYKPVIAFNDHIPSLALWICLHYQGSNGCVTPVAAKFFIIFHFVGLKEIMSPSRNATRNLNQYWRVL A0A023PXB5 IRC2_YEAST MFALIISSKGKTSGFFFNSSFSSSALVGIAPLTAYSALVTPVFKSFLVILPAGLKSKSFAVNTPFKSCWCVIVMCSYFFCVYHLQKQHYCGAPSLYSYLLCL A0A023PXC2 YE53A_YEAST MLPLCLTFLSFFLSLGGSFKAVMTKEEADGTTEAAACLFWIFNWTVTLIPLNSLVALAISSPTFFGDRPKGPIFGAKAAEAPTSPPTALRYKYLTSLGSNFGGIFVYPLFLLSTF A0A023PXD3 YE88A_YEAST MTRLPPIPRMTVTLTTRPAVPTCNEGSSILHYIYIPIYEPNEQKEKRRRKTPPEPRAYTTTTTIATNSRISGCSLTLEDGIHLRGKRAETARLPAATPQKRTGPARG A0A023PXD5 YE147_YEAST MMTAAKRLGLYSALRACSATVFRSNLHPKVTVATMFCSVGTIPDVAEVSFSDSGAALFMSSSLWKVVAGFVPSRFWFSHTCLVFGSNTILFASLNSFKRSSSAIIKKVSLDTPVYVGLEKKNKMQPLLPCFFRRAV A0A023PXE5 YH006_YEAST MDLYPPASWAALVPFCKALTFKVPVVLGNRNPSPPSPLPPMALSLSLLIPLSRLSLSGSSDTADGSLLISCISRGSCGIFRMGCEAVKGRSLGCLLPRSNCTYGCMSLRKYVSVCSM ` Best,

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/129__;!!PvXuogZ4sRB2p-tU!DHpGHlXIHmgaG9JfVh6CGdDdOppnatHSkK2k6JAbxKAYUVdfA1pvwVgAdLAsgQ-e-w9r--rgl_xUkCOLzKChdg$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZXL44FPYPLJV7GJA6TXZHRPRANCNFSM6AAAAAA4PCQRM4__;!!PvXuogZ4sRB2p-tU!DHpGHlXIHmgaG9JfVh6CGdDdOppnatHSkK2k6JAbxKAYUVdfA1pvwVgAdLAsgQ-e-w9r--rgl_xUkCO4XP34Nw$. You are receiving this because you are subscribed to this thread.Message ID: @.***>

typhainepl commented 1 year ago

My input file already has the > symbol in front of the seq IDs.

typhainepl commented 1 year ago

It is not shown in the email, but it is there.

linnabrown commented 1 year ago

I will try ur input on my local machine.

yinlabniu commented 1 year ago

I see, we will look into and get back to you.

From: typhainepl @.> Sent: Thursday, September 7, 2023 10:33 AM To: linnabrown/run_dbcan @.> Cc: Yanbin Yin @.>; Comment @.> Subject: Re: [linnabrown/run_dbcan] dbCAN-sub error (Issue #129)

Non-NU Email

It is not shown in the email, but it is there.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/129*issuecomment-1710365126__;Iw!!PvXuogZ4sRB2p-tU!DobYb8HmElSg-gfc0CWnfguTyu-y3RRp5DOd5tSaCip_DQ-JZkoH_7xoZbLHCO8mUPup7hLuMXz2CNSfrpsNuA$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZSF7KYX5WF5SATMWFLXZHSODANCNFSM6AAAAAA4PCQRM4__;!!PvXuogZ4sRB2p-tU!DobYb8HmElSg-gfc0CWnfguTyu-y3RRp5DOd5tSaCip_DQ-JZkoH_7xoZbLHCO8mUPup7hLuMXz2CNT3kBRjPA$. You are receiving this because you commented.Message ID: @.***>

typhainepl commented 1 year ago

Thank you!

linnabrown commented 1 year ago

Hi @typhainepl , I figured out. It is due to the eval_num and covarage are strict to you (Our default hmm_eval and hmm_cov are 1e-15 and 0.35). Therefore, the parsed file is empty and output does not exist. We did not have this kind of problem so we think the file exists in default.

I am writing the code for the non-existing case to make it robust
You can relax those threshold by changing --hmm_eval and --hmm_cov in your command line.

linnabrown commented 1 year ago

This is e-value


#                                                                            --- full sequence --- -------------- this domain -------------   hmm coord   ali coord   env coord
# target name        accession   tlen query name           accession   qlen   E-value  score  bias   #  of  c-Evalue  i-Evalue  score  bias  from    to  from    to  from    to  acc description of target
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
GH28_e71.hmm|GH28:685 -            346 A0A023FF81           -             90     0.022   14.4   0.1   1   1   2.9e-06     0.025   14.2   0.1   145   195    28    78    13    82 0.89 -
GH28_e105.hmm|GH28:14 -            364 A0A023FF81           -             90      0.11   12.0   0.2   1   1   1.4e-05      0.12   11.9   0.2   181   226    32    77    15    83 0.85 -
GH28_e4.hmm|GH28:43   -            363 A0A023FF81           -             90      0.13   11.6   0.0   1   1   1.5e-05      0.13   11.6   0.0   182   221    40    79    13    84 0.85 -
GT2_e221.hmm|GT2:21  -            228 A0A023PXD5           -            136      0.15   11.7   0.0   1   1   6.3e-06      0.16   11.7   0.0   165   190    53    78    13   107 0.84 -
#
# Program:         hmmscan
# Version:         3.3.2 (Nov 2020)
# Pipeline mode:   SCAN
# Query file:      test_2/0.txt
# Target file:     db/dbCAN_sub.hmm
# Option settings: hmmscan -o /dev/null --domtblout test_2/d0.txt --cpu 5 db/dbCAN_sub.hmm test_2/0.txt 
# Current dir:     /Users/xxx/Desktop/proj/dbcan/run_dbcan
# Date:            Thu Sep  7 18:24:01 2023
# [ok]

typhainepl commented 1 year ago

Thank you for investigating and getting back to me. I'll change the e-value and coverage to see if I can get some results, but it would be great if you can take into account the possibility of having empty results in the pipeline.

linnabrown / run_dbcan

dbCAN-sub error #129

CPU threads: 4

Target sequences to report alignments for: 1