Open typhainepl opened 1 year ago
please use fasta format for the seqs (i.e., put '>' in front of seq IDs).
0A023FBW4 E1142_AMBCJ MTSHGAVKIAIFAVIALHSIFECLSKPQILQRTDHSTDSDWDPQMCPETCNPSKNISCSSECLCVTLGGGDETGTCFNMSGVDWLGHAQASDGHNDG
From: typhainepl @.> Sent: Thursday, September 7, 2023 10:25 AM To: linnabrown/run_dbcan @.> Cc: Subscribed @.***> Subject: [linnabrown/run_dbcan] dbCAN-sub error (Issue #129)
Non-NU Email
Hi,
I'm encountering an error while trying to run dbCAN, and it appears to be related to the output generation. Any assistance you could provide would be greatly appreciated.
I've installed dbcan through conda.
The command I am running is the following:run_dbcan test_seq.faa protein --out_dir test_2
The output and error message:
`
diamond v2.1.8.162 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen Documentation, support and updates available at http://www.diamondsearch.orghttps://urldefense.com/v3/__http://www.diamondsearch.org__;!!PvXuogZ4sRB2p-tU!DHpGHlXIHmgaG9JfVh6CGdDdOppnatHSkK2k6JAbxKAYUVdfA1pvwVgAdLAsgQ-e-w9r--rgl_xUkCOe0VqH4g$ Please cite: http://dx.doi.org/10.1038/s41592-021-01101-xhttps://urldefense.com/v3/__http://dx.doi.org/10.1038/s41592-021-01101-x__;!!PvXuogZ4sRB2p-tU!DHpGHlXIHmgaG9JfVh6CGdDdOppnatHSkK2k6JAbxKAYUVdfA1pvwVgAdLAsgQ-e-w9r--rgl_xUkCOsnQp-Tw$ Nature Methods (2021)
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: test_2
Opening the database... [0.08s] Database: db/CAZy (type: Diamond database, sequences: 2428817, letters: 1157024505) Block size = 2000000000 Building query seed set... [0s] Algorithm: Query-indexed Building query histograms... [0s] Seeking in database... [0s] Loading reference sequences... [3.2s] Initializing temporary storage... [0.014s] Building reference histograms... [6.609s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2. Building reference seed array... [2.74s] Building query seed array... [0s] Computing hash join... [0.058s] Searching alignments... [0s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2. Building reference seed array... [4.652s] Building query seed array... [0s] Computing hash join... [0.515s] Searching alignments... [0.001s] Deallocating memory... [0s] Deallocating buffers... [0.009s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.004s] Sorting trace points... [0s] Computing alignments... [0.001s] Deallocating buffers... [0s] Loading trace points... [0s] [0.007s] Deallocating reference... [0.007s] Loading reference sequences... [0s] Deallocating buffers... [0s] Deallocating queries... [0s] Total time = 17.908s Reported 0 pairwise alignments, 0 HSPs. 0 queries aligned.
DIAMOND end****
HMMER start**
HMMER end****
dbCAN_sub start****
ID count: 8 total time: 5.017667531967163
Traceback (most recent call last): File "/homes/typhaine/miniconda3/envs/run_dbcan/bin/run_dbcan", line 10, in sys.exit(cli_main()) File "/homes/typhaine/miniconda3/envs/run_dbcan/lib/python3.8/site-packages/dbcan_cli/run_dbcan.py", line 883, in cli_main run(inputFile=args.inputFile, inputType=args.inputType, cluster=args.cluster, dbCANFile=args.dbCANFile, File "/homes/typhaine/miniconda3/envs/run_dbcan/lib/python3.8/site-packages/dbcan_cli/run_dbcan.py", line 290, in run with open(f"{outPath}dbsub.out") as f: FileNotFoundError: [Errno 2] No such file or directory: 'test_2/dbsub.out' `
My input file (test_seq.faa) looks like this:
`
0A023FBW4 E1142_AMBCJ MTSHGAVKIAIFAVIALHSIFECLSKPQILQRTDHSTDSDWDPQMCPETCNPSKNISCSSECLCVTLGGGDETGTCFNMSGVDWLGHAQASDGHNDG A0A023FF81 E1126_AMBCJ MTSHSAVRIAIFAVIALHSIFECLSKPQILQRTDKSTDSEWDPQTCPETCIPSKNITCSDGCVCVKLGEEEEGTCFNMTGVDWLGSPSDD A0A023PXA5 YA19A_YEAST MLLSELVATASSLPYTAISIHNNCRVPAARHIHHGCRYFHGPPVMHLPQCLRTIQFSPSVISTSYQIPVICQHHAVVPTARYLPDYCSIISWHRPLWGIHILIVPQSQLPLPIRPKRIHTTHRYKPVIAFNDHIPSLALWICLHYQGSNGCVTPVAAKFFIIFHFVGLKEIMSPSRNATRNLNQYWRVL A0A023PXB5 IRC2_YEAST MFALIISSKGKTSGFFFNSSFSSSALVGIAPLTAYSALVTPVFKSFLVILPAGLKSKSFAVNTPFKSCWCVIVMCSYFFCVYHLQKQHYCGAPSLYSYLLCL A0A023PXC2 YE53A_YEAST MLPLCLTFLSFFLSLGGSFKAVMTKEEADGTTEAAACLFWIFNWTVTLIPLNSLVALAISSPTFFGDRPKGPIFGAKAAEAPTSPPTALRYKYLTSLGSNFGGIFVYPLFLLSTF A0A023PXD3 YE88A_YEAST MTRLPPIPRMTVTLTTRPAVPTCNEGSSILHYIYIPIYEPNEQKEKRRRKTPPEPRAYTTTTTIATNSRISGCSLTLEDGIHLRGKRAETARLPAATPQKRTGPARG A0A023PXD5 YE147_YEAST MMTAAKRLGLYSALRACSATVFRSNLHPKVTVATMFCSVGTIPDVAEVSFSDSGAALFMSSSLWKVVAGFVPSRFWFSHTCLVFGSNTILFASLNSFKRSSSAIIKKVSLDTPVYVGLEKKNKMQPLLPCFFRRAV A0A023PXE5 YH006_YEAST MDLYPPASWAALVPFCKALTFKVPVVLGNRNPSPPSPLPPMALSLSLLIPLSRLSLSGSSDTADGSLLISCISRGSCGIFRMGCEAVKGRSLGCLLPRSNCTYGCMSLRKYVSVCSM ` Best,
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/129__;!!PvXuogZ4sRB2p-tU!DHpGHlXIHmgaG9JfVh6CGdDdOppnatHSkK2k6JAbxKAYUVdfA1pvwVgAdLAsgQ-e-w9r--rgl_xUkCOLzKChdg$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZXL44FPYPLJV7GJA6TXZHRPRANCNFSM6AAAAAA4PCQRM4__;!!PvXuogZ4sRB2p-tU!DHpGHlXIHmgaG9JfVh6CGdDdOppnatHSkK2k6JAbxKAYUVdfA1pvwVgAdLAsgQ-e-w9r--rgl_xUkCO4XP34Nw$. You are receiving this because you are subscribed to this thread.Message ID: @.***>
My input file already has the >
symbol in front of the seq IDs.
It is not shown in the email, but it is there.
I will try ur input on my local machine.
I see, we will look into and get back to you.
From: typhainepl @.> Sent: Thursday, September 7, 2023 10:33 AM To: linnabrown/run_dbcan @.> Cc: Yanbin Yin @.>; Comment @.> Subject: Re: [linnabrown/run_dbcan] dbCAN-sub error (Issue #129)
Non-NU Email
It is not shown in the email, but it is there.
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/linnabrown/run_dbcan/issues/129*issuecomment-1710365126__;Iw!!PvXuogZ4sRB2p-tU!DobYb8HmElSg-gfc0CWnfguTyu-y3RRp5DOd5tSaCip_DQ-JZkoH_7xoZbLHCO8mUPup7hLuMXz2CNSfrpsNuA$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXNKZSF7KYX5WF5SATMWFLXZHSODANCNFSM6AAAAAA4PCQRM4__;!!PvXuogZ4sRB2p-tU!DobYb8HmElSg-gfc0CWnfguTyu-y3RRp5DOd5tSaCip_DQ-JZkoH_7xoZbLHCO8mUPup7hLuMXz2CNT3kBRjPA$. You are receiving this because you commented.Message ID: @.***>
Thank you!
Hi @typhainepl , I figured out. It is due to the eval_num and covarage are strict to you (Our default hmm_eval and hmm_cov are 1e-15 and 0.35). Therefore, the parsed file is empty and output does not exist. We did not have this kind of problem so we think the file exists in default.
--hmm_eval
and --hmm_cov
in your command line.This is e-value
# --- full sequence --- -------------- this domain ------------- hmm coord ali coord env coord
# target name accession tlen query name accession qlen E-value score bias # of c-Evalue i-Evalue score bias from to from to from to acc description of target
#------------------- ---------- ----- -------------------- ---------- ----- --------- ------ ----- --- --- --------- --------- ------ ----- ----- ----- ----- ----- ----- ----- ---- ---------------------
GH28_e71.hmm|GH28:685 - 346 A0A023FF81 - 90 0.022 14.4 0.1 1 1 2.9e-06 0.025 14.2 0.1 145 195 28 78 13 82 0.89 -
GH28_e105.hmm|GH28:14 - 364 A0A023FF81 - 90 0.11 12.0 0.2 1 1 1.4e-05 0.12 11.9 0.2 181 226 32 77 15 83 0.85 -
GH28_e4.hmm|GH28:43 - 363 A0A023FF81 - 90 0.13 11.6 0.0 1 1 1.5e-05 0.13 11.6 0.0 182 221 40 79 13 84 0.85 -
GT2_e221.hmm|GT2:21 - 228 A0A023PXD5 - 136 0.15 11.7 0.0 1 1 6.3e-06 0.16 11.7 0.0 165 190 53 78 13 107 0.84 -
#
# Program: hmmscan
# Version: 3.3.2 (Nov 2020)
# Pipeline mode: SCAN
# Query file: test_2/0.txt
# Target file: db/dbCAN_sub.hmm
# Option settings: hmmscan -o /dev/null --domtblout test_2/d0.txt --cpu 5 db/dbCAN_sub.hmm test_2/0.txt
# Current dir: /Users/xxx/Desktop/proj/dbcan/run_dbcan
# Date: Thu Sep 7 18:24:01 2023
# [ok]
Thank you for investigating and getting back to me. I'll change the e-value and coverage to see if I can get some results, but it would be great if you can take into account the possibility of having empty results in the pipeline.
Hi,
I'm encountering an error while trying to run dbCAN, and it appears to be related to the output generation. Any assistance you could provide would be greatly appreciated.
I've installed dbcan through conda.
The command I am running is the following:
run_dbcan test_seq.faa protein --out_dir test_2
The output and error message:
My input file (test_seq.faa) looks like this:
Best,
Typhaine