linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
131 stars 40 forks source link

CGC-JSON error #111

Open joeyclancy opened 1 year ago

joeyclancy commented 1 year ago

hello, run_dbcan supported team when I run run_dbcan data/bin2321.fasta prok -c cluster --out_dir cay/bin2321 --db_dir ~/database/cayzmes, there was error about JSON, what should I do? The dbcan version is 4.0.0, here is the log:

`*1. DIAMOND start***

diamond v2.1.3.157 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 4

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: cay/bin2321

Target sequences to report alignments for: 1

Opening the database... [0.495s] Database: /public/home/hymeta/database/cayzmes/CAZy (type: Diamond database, sequences: 2428817, letters: 1157024505) Block size = 2000000000 Opening the input file... [0.018s] Opening the output file... [0.002s] Loading query sequences... [0.015s] Masking queries... [0.042s] Building query seed set... [0.165s] Algorithm: Query-indexed Building query histograms... [0.005s] Loading reference sequences... [5.065s] Initializing temporary storage... [0.048s] Building reference histograms... [7.909s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2. Building reference seed array... [4.663s] Building query seed array... [0.01s] Computing hash join... [0.832s] Searching alignments... [0.361s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2. Building reference seed array... [4.403s] Building query seed array... [0.01s] Computing hash join... [0.819s] Searching alignments... [0.347s] Deallocating memory... [0s] Deallocating buffers... [0.008s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.228s] Sorting trace points... [0.019s] Computing alignments... [6.334s] Deallocating buffers... [0s] Loading trace points... [0s] [6.599s] Deallocating reference... [0.003s] Loading reference sequences... [0.083s] Deallocating buffers... [0s] Deallocating queries... [0s] Loading query sequences... [0.004s] Closing the input file... [0s] Closing the output file... [0.002s] Closing the database... [0s] Cleaning up... [0s] Total time = 31.933s Reported 156 pairwise alignments, 156 HSPs. 156 queries aligned.

*1. DIAMOND end*****

*2. HMMER start***

*2. HMMER end*****

*3. dbCAN_sub start*****

ID count: 6304 total time: 672.9570934772491

*3. dbCAN_sub end*****

Traceback (most recent call last): File "/public/home/hymeta/anaconda3/envs/run_dbcan/bin/run_dbcan", line 10, in sys.exit(cli_main()) ^^^^^^^^^^ File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/site-packages/dbcan_cli/run_dbcan.py", line 883, in cli_main run(inputFile=args.inputFile, inputType=args.inputType, cluster=args.cluster, dbCANFile=args.dbCANFile, File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/site-packages/dbcan_cli/run_dbcan.py", line 282, in run with open(f"{dbDir}fam-substrate-mapping-08252022.tsv", 'r') as f: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: '/public/home/hymeta/database/cayzmes/fam-substrate-mapping-08252022.tsv' (run_dbcan) hymeta^node17 /data1/hydata/rot/bin/diff_bray/diff_func > ll total 12 drwxr-xr-x 3 hymeta users 4096 Mar 13 20:38 cay drwxr-xr-x 2 hymeta users 4096 Mar 13 19:12 data -rw-r--r-- 1 hymeta users 56 Mar 13 19:14 rep_bin.txt (run_dbcan) hymeta^node17 /data1/hydata/rot/bin/diff_bray/diff_func > rm -rf cay/bin2321/ (run_dbcan) hymeta^node17 /data1/hydata/rot/bin/diff_bray/diff_func > run_dbcan data/bin2321.fasta prok -c cluster --out_dir cay/bin2321 --db_dir ~/database/cayzmes

*1. DIAMOND start***

diamond v2.1.3.157 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 4

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: cay/bin2321

Target sequences to report alignments for: 1

Opening the database... [0.065s] Database: /public/home/hymeta/database/cayzmes/CAZy (type: Diamond database, sequences: 2428817, letters: 1157024505) Block size = 2000000000 Opening the input file... [0.011s] Opening the output file... [0.002s] Loading query sequences... [0.014s] Masking queries... [0.042s] Building query seed set... [0.161s] Algorithm: Query-indexed Building query histograms... [0.006s] Loading reference sequences... [2.773s] Initializing temporary storage... [0.332s] Building reference histograms... [7.886s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2. Building reference seed array... [4.569s] Building query seed array... [0.012s] Computing hash join... [0.82s] Searching alignments... [0.319s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2. Building reference seed array... [4.372s] Building query seed array... [0.009s] Computing hash join... [0.819s] Searching alignments... [0.328s] Deallocating memory... [0s] Deallocating buffers... [0.006s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.331s] Sorting trace points... [0.018s] Computing alignments... [6.292s] Deallocating buffers... [0s] Loading trace points... [0s] [6.717s] Deallocating reference... [0.004s] Loading reference sequences... [0s] Deallocating buffers... [0s] Deallocating queries... [0s] Loading query sequences... [0.004s] Closing the input file... [0s] Closing the output file... [0.004s] Closing the database... [0s] Cleaning up... [0s] Total time = 29.303s Reported 156 pairwise alignments, 156 HSPs. 156 queries aligned.

*1. DIAMOND end*****

*2. HMMER start***

*2. HMMER end*****

*3. dbCAN_sub start*****

ID count: 6304 total time: 544.677565574646

*3. dbCAN_sub end*****

No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it *****CGC-Finder start**** diamond v2.1.3.157 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 1

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: cay/bin2321

Target sequences to report alignments for: 1

Opening the database... [0.151s] Database: /public/home/hymeta/database/cayzmes/tcdb.dmnd (type: Diamond database, sequences: 14465, letters: 6343638) Block size = 2000000000 Opening the input file... [0.014s] Opening the output file... [0.005s] Loading query sequences... [0.016s] Masking queries... [0.146s] Algorithm: Double-indexed Building query histograms... [0.124s] Loading reference sequences... [0.024s] Masking reference... [0.506s] Initializing temporary storage... [0.182s] Building reference histograms... [0.384s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4. Building reference seed array... [0.11s] Building query seed array... [0.03s] Computing hash join... [0.041s] Masking low complexity seeds... [0.001s] Searching alignments... [0.009s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4. Building reference seed array... [0.138s] Building query seed array... [0.038s] Computing hash join... [0.031s] Masking low complexity seeds... [0.001s] Searching alignments... [0.008s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 3/4. Building reference seed array... [0.151s] Building query seed array... [0.042s] Computing hash join... [0.03s] Masking low complexity seeds... [0.001s] Searching alignments... [0.007s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 4/4. Building reference seed array... [0.107s] Building query seed array... [0.03s] Computing hash join... [0.03s] Masking low complexity seeds... [0.002s] Searching alignments... [0.008s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 1/4. Building reference seed array... [0.105s] Building query seed array... [0.029s] Computing hash join... [0.03s] Masking low complexity seeds... [0s] Searching alignments... [0.007s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 2/4. Building reference seed array... [0.134s] Building query seed array... [0.037s] Computing hash join... [0.03s] Masking low complexity seeds... [0s] Searching alignments... [0.007s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 3/4. Building reference seed array... [0.149s] Building query seed array... [0.051s] Computing hash join... [0.037s] Masking low complexity seeds... [0.001s] Searching alignments... [0.006s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 4/4. Building reference seed array... [0.116s] Building query seed array... [0.035s] Computing hash join... [0.04s] Masking low complexity seeds... [0.001s] Searching alignments... [0.01s] Deallocating memory... [0s] Deallocating buffers... [0s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.203s] Sorting trace points... [0.002s] Computing alignments... [0.823s] Deallocating buffers... [0s] Loading trace points... [0s] [1.135s] Deallocating reference... [0s] Loading reference sequences... [0s] Deallocating buffers... [0s] Deallocating queries... [0s] Loading query sequences... [0s] Closing the input file... [0s] Closing the output file... [0.01s] Closing the database... [0s] Cleaning up... [0s] Total time = 4.408s Reported 1021 pairwise alignments, 1021 HSPs. 1021 queries aligned. **CGC-Finder start* **CGC-Finder end* Preparing overview table from hmmer, dbCAN_sub and diamond output... overview table complete. Saved as cay/bin2321/overview.txt Traceback (most recent call last): File "/public/home/hymeta/anaconda3/envs/run_dbcan/bin/cgc_standard2json", line 10, in sys.exit(main()) ^^^^^^ File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/site-packages/dbcan_cli/cgc_process_json.py", line 114, in main jsonPuls = json.dumps(pul_dict, indent=4) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/json/init.py", line 238, in dumps **kw).encode(obj) ^^^^^^^^^^^ File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/json/encoder.py", line 202, in encode chunks = list(chunks) ^^^^^^^^^^^^ File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/json/encoder.py", line 432, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/json/encoder.py", line 406, in _iterencode_dict yield from chunks File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/json/encoder.py", line 406, in _iterencode_dict yield from chunks File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/json/encoder.py", line 326, in _iterencode_list yield from chunks File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/json/encoder.py", line 406, in _iterencode_dict yield from chunks File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/json/encoder.py", line 439, in _iterencode o = _default(o) ^^^^^^^^^^^ File "/public/home/hymeta/anaconda3/envs/run_dbcan/lib/python3.11/json/encoder.py", line 180, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type int64 is not JSON serializable`

linnabrown commented 1 year ago

Hi @joeyclancy thanks for bringing about this issue. Jinfang will check it asap.

joeyclancy commented 1 year ago

Hi @joeyclancy thanks for bringing about this issue. Jinfang will check it asap.

Thanks!

zhengzhengzhj commented 1 year ago

We have updated the source code, and this error is corrected in the updated version.

linnabrown commented 1 year ago

Hi Jinfang, So you updated in which version? I just found Qiwei annotate the json code part,

Best, Le

Get Outlook for iOShttps://aka.ms/o0ukef


From: zhengzhengzhj @.> Sent: Sunday, April 23, 2023 8:23:01 PM To: linnabrown/run_dbcan @.> Cc: Huang, Le @.>; Comment @.> Subject: Re: [linnabrown/run_dbcan] CGC-JSON error (Issue #111)

We have updated the source code, and this error is corrected in the updated version.

— Reply to this email directly, view it on GitHubhttps://github.com/linnabrown/run_dbcan/issues/111#issuecomment-1519208110, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACMHALW7GFKBMDCHIU6JPD3XCXBWLANCNFSM6AAAAAAVZCQHZM. You are receiving this because you commented.Message ID: @.***>

zmqstc commented 11 months ago

Hello, run_dbcan supported team. Same problem here. Here's what I did, with the latest version of run_dbcan. run_dbcan F5094_bin_42.fa meta --out_dir F5094_bin_42 --cluster 1 --cgc_substrate --db_dir ~/database/dbCAN2, there was error about JSON. Could you please tell me what should I do now? Which produced the following files: blastp.out cgc.gff cgc_standard.out diamond.out hmmer.out prodigal.gff sub.prediction.out tf-1.out tp.out CAZyme.pep cgc.out dbsub.out dtemp.out overview.txt stp.out syntenic.svg tf-2.out uniInput I wonder if there is anything I am missing here. Thank you! Here is the err:

CPU threads: 4

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: F5094_bin_42

Target sequences to report alignments for: 1

Opening the database... [0.158s] Database: /hl/zhoumengqing/database/dbCAN2/CAZy (type: Diamond database, sequences: 2428817, letters: 1157024505) Block size = 2000000000 Opening the input file... [0s] Opening the output file... [0s] Loading query sequences... [0.004s] Masking queries... [0.016s] Building query seed set... [0.14s] Algorithm: Query-indexed Building query histograms... [0.002s] Seeking in database... [0s] Loading reference sequences... [3.461s] Initializing temporary storage... [0s] Building reference histograms... [11.009s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2. Building reference seed array... [5.95s] Building query seed array... [0.004s] Computing hash join... [0.191s] Searching alignments... [0.233s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2. Building reference seed array... [5.685s] Building query seed array... [0.003s] Computing hash join... [0.183s] Searching alignments... [0.208s] Deallocating memory... [0s] Deallocating buffers... [0.066s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.03s] Sorting trace points... [0.006s] Computing alignments... [2.564s] Deallocating buffers... [0s] Loading trace points... [0s] [2.606s] Deallocating reference... [0.139s] Loading reference sequences... [0s] Deallocating buffers... [0s] Deallocating queries... [0s] Loading query sequences... [0s] Closing the input file... [0s] Closing the output file... [0s] Closing the database... [0.007s] Cleaning up... [0s] Total time = 30.081s Reported 43 pairwise alignments, 43 HSPs. 43 queries aligned. diamond v2.1.7.161 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 1

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: F5094_bin_42

Target sequences to report alignments for: 1

Opening the database... [0.017s] Database: /hl/zhoumengqing/database/dbCAN2/tcdb.dmnd (type: Diamond database, sequences: 14465, letters: 6343638) Block size = 2000000000 Opening the input file... [0.001s] Opening the output file... [0s] Loading query sequences... [0.003s] Masking queries... [0.036s] Algorithm: Double-indexed Building query histograms... [0.03s] Seeking in database... [0s] Loading reference sequences... [0.019s] Masking reference... [0.449s] Initializing temporary storage... [0s] Building reference histograms... [0.414s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4. Building reference seed array... [0.151s] Building query seed array... [0.012s] Computing hash join... [0.027s] Masking low complexity seeds... [0.001s] Searching alignments... [0.006s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4. Building reference seed array... [0.165s] Building query seed array... [0.012s] Computing hash join... [0.028s] Masking low complexity seeds... [0.001s] Searching alignments... [0.005s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 3/4. Building reference seed array... [0.188s] Building query seed array... [0.013s] Computing hash join... [0.028s] Masking low complexity seeds... [0.001s] Searching alignments... [0.004s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 4/4. Building reference seed array... [0.146s] Building query seed array... [0.011s] Computing hash join... [0.027s] Masking low complexity seeds... [0s] Searching alignments... [0.003s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 1/4. Building reference seed array... [0.144s] Building query seed array... [0.01s] Computing hash join... [0.027s] Masking low complexity seeds... [0.001s] Searching alignments... [0.005s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 2/4. Building reference seed array... [0.166s] Building query seed array... [0.012s] Computing hash join... [0.027s] Masking low complexity seeds... [0.001s] Searching alignments... [0.003s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 3/4. Building reference seed array... [0.18s] Building query seed array... [0.013s] Computing hash join... [0.027s] Masking low complexity seeds... [0.001s] Searching alignments... [0.004s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 4/4. Building reference seed array... [0.143s] Building query seed array... [0.01s] Computing hash join... [0.027s] Masking low complexity seeds... [0.001s] Searching alignments... [0.003s] Deallocating memory... [0s] Deallocating buffers... [0s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.003s] Sorting trace points... [0s] Computing alignments... [0.15s] Deallocating buffers... [0s] Loading trace points... [0s] [0.155s] Deallocating reference... [0s] Loading reference sequences... [0s] Deallocating buffers... [0s] Deallocating queries... [0s] Loading query sequences... [0s] Closing the input file... [0s] Closing the output file... [0s] Closing the database... [0s] Cleaning up... [0s] Total time = 2.792s Reported 101 pairwise alignments, 101 HSPs. 101 queries aligned. Traceback (most recent call last): File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/bin/cgc_standard2json", line 10, in sys.exit(main()) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/site-packages/dbcan_cli/cgc_process_json.py", line 114, in main jsonPuls = json.dumps(pul_dict, indent=4) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/init.py", line 234, in dumps return cls( File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 201, in encode chunks = list(chunks) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks Deallocating queries... [0s] Loading query sequences... [0s] Closing the input file... [0s] Closing the output file... [0s] Closing the database... [0s] Cleaning up... [0s] Total time = 2.792s Reported 101 pairwise alignments, 101 HSPs. 101 queries aligned. Traceback (most recent call last): File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/bin/cgc_standard2json", line 10, in sys.exit(main()) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/site-packages/dbcan_cli/cgc_process_json.py", line 114, in main jsonPuls = json.dumps(pul_dict, indent=4) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/init.py", line 234, in dumps return cls( File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 201, in encode chunks = list(chunks) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 325, in _iterencode_list yield from chunks File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 438, in _iterencode o = _default(o) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type int64 is not JSON serializable Start blastp CAZyme sequences against to database: /hl/zhoumengqing/database/dbCAN2/PUL.faa Start analyzing blastp result Get best pul hit ~ And here is the log: the temprory directory is /opt current file is

*1. DIAMOND start***

*1. DIAMOND end*****

*2. HMMER start***

*2. HMMER end*****

*3. dbCAN_sub start*****

ID count: 1684 total time: 1394.116188287735

*3. dbCAN_sub end*****

No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it *CGC-Finder start**** **CGC-Finder start*** **CGC-Finder end*** Preparing overview table from hmmer, dbCAN_sub and diamond output... overview table complete. Saved as F5094_bin_42/overview.txt Start extracting sequence! Reading blastp result F5094_bin_42/blastp.out Reading dbsub outfile:F5094_bin_42/dbsub.out Start eCAMI subfamily substrate scoring Substrate prediciton done! 3.962833881378174s Writing substrate prediction result to file:F5094_bin_42/sub.prediction.out All done! 3.963017702102661s Remove tmp files at zmq_m10143640_0 Job finished at: Thu Aug 10 15:03:35 CST 2023

NataliiaKuzub commented 9 months ago

Greetings!

I have the same error message while running the test. I've installed run_dbcan today thus I'm sure that I'm using the latest version available. Do you have any idea what might cause the problem and if it affects the overall results obtained?

Code I've run: run_dbcan EscheriaColiK12MG1655.fna prok -c CLUSTER --db_dir /scratch/databases/run_dbcan_db --out_dir output3_EscheriaColiK12MG1655 --use_signalP=TRUE -sp /scratch/tools/signalp/signalp-4.1/signalp

The error message I've got (I included error message only, but can provide full text in case you need it): Traceback (most recent call last):
File "/home/conda/envs/run_dbcan_env/bin/cgc_standard2json", line 10, in
sys.exit(main())
File "/home/conda/envs/run_dbcan_env/lib/python3.8/site-packages/dbcan_cli/cgc_process_json.py", line 114, in main jsonPuls = json.dumps(pul_dict, indent=4)
File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/init.py", line 234, in dumps
return cls(
File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 325, in _iterencode_list
yield from chunks
File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type int64 is not JSON serializable

ZhengJinfang1220 commented 7 months ago

Greetings!

I have the same error message while running the test. I've installed run_dbcan today thus I'm sure that I'm using the latest version available. Do you have any idea what might cause the problem and if it affects the overall results obtained?

Code I've run: run_dbcan EscheriaColiK12MG1655.fna prok -c CLUSTER --db_dir /scratch/databases/run_dbcan_db --out_dir output3_EscheriaColiK12MG1655 --use_signalP=TRUE -sp /scratch/tools/signalp/signalp-4.1/signalp

The error message I've got (I included error message only, but can provide full text in case you need it): Traceback (most recent call last): File "/home/conda/envs/run_dbcan_env/bin/cgc_standard2json", line 10, in sys.exit(main()) File "/home/conda/envs/run_dbcan_env/lib/python3.8/site-packages/dbcan_cli/cgc_process_json.py", line 114, in main jsonPuls = json.dumps(pul_dict, indent=4) File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/init.py", line 234, in dumps return cls( File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 201, in encode chunks = list(chunks) File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 325, in _iterencode_list yield from chunks File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 438, in _iterencode o = _default(o) File "/home/conda/envs/run_dbcan_env/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type int64 is not JSON serializable

Thank you for pointing out this issue. It seems related to the version of the JSON package. The codes just convert "cgc_standard.out" to JSON format. The error will not affect the results. And the code works well with version 2.0.9. We are going to release a new version.

ZhengJinfang1220 commented 7 months ago

Hello, run_dbcan supported team. Same problem here. Here's what I did, with the latest version of run_dbcan. run_dbcan F5094_bin_42.fa meta --out_dir F5094_bin_42 --cluster 1 --cgc_substrate --db_dir ~/database/dbCAN2, there was error about JSON. Could you please tell me what should I do now? Which produced the following files: blastp.out cgc.gff cgc_standard.out diamond.out hmmer.out prodigal.gff sub.prediction.out tf-1.out tp.out CAZyme.pep cgc.out dbsub.out dtemp.out overview.txt stp.out syntenic.svg tf-2.out uniInput I wonder if there is anything I am missing here. Thank you! Here is the err: #CPU threads: 4 Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: F5094_bin_42 #Target sequences to report alignments for: 1 Opening the database... [0.158s] Database: /hl/zhoumengqing/database/dbCAN2/CAZy (type: Diamond database, sequences: 2428817, letters: 1157024505) Block size = 2000000000 Opening the input file... [0s] Opening the output file... [0s] Loading query sequences... [0.004s] Masking queries... [0.016s] Building query seed set... [0.14s] Algorithm: Query-indexed Building query histograms... [0.002s] Seeking in database... [0s] Loading reference sequences... [3.461s] Initializing temporary storage... [0s] Building reference histograms... [11.009s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2. Building reference seed array... [5.95s] Building query seed array... [0.004s] Computing hash join... [0.191s] Searching alignments... [0.233s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2. Building reference seed array... [5.685s] Building query seed array... [0.003s] Computing hash join... [0.183s] Searching alignments... [0.208s] Deallocating memory... [0s] Deallocating buffers... [0.066s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.03s] Sorting trace points... [0.006s] Computing alignments... [2.564s] Deallocating buffers... [0s] Loading trace points... [0s] [2.606s] Deallocating reference... [0.139s] Loading reference sequences... [0s] Deallocating buffers... [0s] Deallocating queries... [0s] Loading query sequences... [0s] Closing the input file... [0s] Closing the output file... [0s] Closing the database... [0.007s] Cleaning up... [0s] Total time = 30.081s Reported 43 pairwise alignments, 43 HSPs. 43 queries aligned. diamond v2.1.7.161 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 1 Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: F5094_bin_42 #Target sequences to report alignments for: 1 Opening the database... [0.017s] Database: /hl/zhoumengqing/database/dbCAN2/tcdb.dmnd (type: Diamond database, sequences: 14465, letters: 6343638) Block size = 2000000000 Opening the input file... [0.001s] Opening the output file... [0s] Loading query sequences... [0.003s] Masking queries... [0.036s] Algorithm: Double-indexed Building query histograms... [0.03s] Seeking in database... [0s] Loading reference sequences... [0.019s] Masking reference... [0.449s] Initializing temporary storage... [0s] Building reference histograms... [0.414s] Allocating buffers... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4. Building reference seed array... [0.151s] Building query seed array... [0.012s] Computing hash join... [0.027s] Masking low complexity seeds... [0.001s] Searching alignments... [0.006s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4. Building reference seed array... [0.165s] Building query seed array... [0.012s] Computing hash join... [0.028s] Masking low complexity seeds... [0.001s] Searching alignments... [0.005s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 3/4. Building reference seed array... [0.188s] Building query seed array... [0.013s] Computing hash join... [0.028s] Masking low complexity seeds... [0.001s] Searching alignments... [0.004s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 1/2, index chunk 4/4. Building reference seed array... [0.146s] Building query seed array... [0.011s] Computing hash join... [0.027s] Masking low complexity seeds... [0s] Searching alignments... [0.003s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 1/4. Building reference seed array... [0.144s] Building query seed array... [0.01s] Computing hash join... [0.027s] Masking low complexity seeds... [0.001s] Searching alignments... [0.005s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 2/4. Building reference seed array... [0.166s] Building query seed array... [0.012s] Computing hash join... [0.027s] Masking low complexity seeds... [0.001s] Searching alignments... [0.003s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 3/4. Building reference seed array... [0.18s] Building query seed array... [0.013s] Computing hash join... [0.027s] Masking low complexity seeds... [0.001s] Searching alignments... [0.004s] Deallocating memory... [0s] Processing query block 1, reference block 1/1, shape 2/2, index chunk 4/4. Building reference seed array... [0.143s] Building query seed array... [0.01s] Computing hash join... [0.027s] Masking low complexity seeds... [0.001s] Searching alignments... [0.003s] Deallocating memory... [0s] Deallocating buffers... [0s] Clearing query masking... [0s] Computing alignments... Loading trace points... [0.003s] Sorting trace points... [0s] Computing alignments... [0.15s] Deallocating buffers... [0s] Loading trace points... [0s] [0.155s] Deallocating reference... [0s] Loading reference sequences... [0s] Deallocating buffers... [0s] Deallocating queries... [0s] Loading query sequences... [0s] Closing the input file... [0s] Closing the output file... [0s] Closing the database... [0s] Cleaning up... [0s] Total time = 2.792s Reported 101 pairwise alignments, 101 HSPs. 101 queries aligned. Traceback (most recent call last): File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/bin/cgc_standard2json", line 10, in sys.exit(main()) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/site-packages/dbcan_cli/cgc_process_json.py", line 114, in main jsonPuls = json.dumps(pul_dict, indent=4) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/init.py", line 234, in dumps return cls( File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 201, in encode chunks = list(chunks) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks Deallocating queries... [0s] Loading query sequences... [0s] Closing the input file... [0s] Closing the output file... [0s] Closing the database... [0s] Cleaning up... [0s] Total time = 2.792s Reported 101 pairwise alignments, 101 HSPs. 101 queries aligned. Traceback (most recent call last): File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/bin/cgc_standard2json", line 10, in sys.exit(main()) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/site-packages/dbcan_cli/cgc_process_json.py", line 114, in main jsonPuls = json.dumps(pul_dict, indent=4) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/init.py", line 234, in dumps return cls( File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 201, in encode chunks = list(chunks) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 431, in _iterencode yield from _iterencode_dict(o, _current_indent_level) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 325, in _iterencode_list yield from chunks File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 405, in _iterencode_dict yield from chunks File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 438, in _iterencode o = _default(o) File "/hl/zhoumengqing/miniconda3/envs/run_dbcan/lib/python3.8/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type int64 is not JSON serializable Start blastp CAZyme sequences against to database: /hl/zhoumengqing/database/dbCAN2/PUL.faa Start analyzing blastp result Get best pul hit ~ And here is the log: the temprory directory is /opt current file is

1. DIAMOND start**

1. DIAMOND end****

2. HMMER start**

2. HMMER end****

_3. dbCAN_sub start_****

ID count: 1684 total time: 1394.116188287735

_3. dbCAN_sub end_****

No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it No substrate for it CGC-Finder start* **CGC-Finder start***** **CGC-Finder end*** Preparing overview table from hmmer, dbCAN_sub and diamond output... overview table complete. Saved as F5094_bin_42/overview.txt Start extracting sequence! Reading blastp result F5094_bin_42/blastp.out Reading dbsub outfile:F5094_bin_42/dbsub.out Start eCAMI subfamily substrate scoring Substrate prediciton done! 3.962833881378174s Writing substrate prediction result to file:F5094_bin_42/sub.prediction.out All done! 3.963017702102661s Remove tmp files at zmq_m10143640_0 Job finished at: Thu Aug 10 15:03:35 CST 2023

Thank you for pointing out this issue. It seems related to the version of the JSON package. The codes just convert "cgc_standard.out" to JSON format. The error will not affect the results. And the code works well with version 2.0.9. We are going to release a new version.