linnabrown / run_dbcan

Run_dbcan V4, using genomes/metagenomes/proteomes of any assembled organisms (prokaryotes, fungi, plants, animals, viruses) to search for CAZymes.
http://bcb.unl.edu/dbCAN2
GNU General Public License v3.0
138 stars 40 forks source link

IndexError in CGC-Finder #124

Closed Rob-murphys closed 1 year ago

Rob-murphys commented 1 year ago

Given the fantastic new capabilities of dbcan 4 I am wanting to rerun some data I previously ran through dbcan 2. I have installed the latest version of the tool and attempted to run my samples through it but once CGC-Finder I get this error:

My running command is: run_dbcan $input protein --db_dir $database --out_dir $outdir --hmm_cpu 10 --dia_cpu 10 --tf_cpu 10 --stp_cpu 10 --cluster $locations --cgc_substrate

Clearing query masking...  [0.014s]
Computing alignments... Loading trace points...  [0.092s]
Sorting trace points...  [0.025s]
Computing alignments...  [20.672s]
Deallocating buffers...  [0s]
Loading trace points...  [0s]
 [20.799s]
Deallocating reference...  [0s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0.014s]
Total time = 51.184s
Reported 37268 pairwise alignments, 37268 HSPs.
37268 queries aligned.

Traceback (most recent call last):
  File "/home/projects/ku_00014/people/robmur/programmes/miniconda3/envs/run_dbcan_4/bin/run_dbcan", line 10, in <module>
    sys.exit(cli_main())
  File "/home/projects/ku_00014/people/robmur/programmes/miniconda3/envs/run_dbcan_4/lib/python3.8/site-packages/dbcan_cli/run_dbcan.py", line 883, in cli_main
    run(inputFile=args.inputFile, inputType=args.inputType, cluster=args.cluster, dbCANFile=args.dbCANFile,
  File "/home/projects/ku_00014/people/robmur/programmes/miniconda3/envs/run_dbcan_4/lib/python3.8/site-packages/dbcan_cli/run_dbcan.py", line 584, in run
    gene = row[1]
IndexError: list index out of range

I included a few of the lines that come before it in case that is helpful. Any idea how I can solve this issue?

linnabrown commented 1 year ago

Hi thanks for using our tool could you send us ur data (without sensitive info) ?

Rob-murphys commented 1 year ago

sample data.zip Here is a sample of the data

linnabrown commented 1 year ago

thx will do this in the weekend

Rob-murphys commented 1 year ago

Many thanks :) I am really excited to see the predicted substrates and gene families!

linnabrown commented 1 year ago

Thx

Rob-murphys commented 1 year ago

@linnabrown Hey Le, I was wondering if you had managed to find the cause of the issue?

linnabrown commented 1 year ago

Sorry I forgot to resolve it since I was busy these 3 weeks. Let me do this next week when I back to school.

Rob-murphys commented 1 year ago

Hey @linnabrown, No worries, I hope you settled back into work after the summer break 😄 I have tried running the tool on subsets of the data and after freshly reinstalling and still get the same error. unfortunately.

yinlabniu commented 1 year ago

Hi Rob, the problem might be from the location file (gff), specified by --cluster $locations. It might be that it's not in an acceptable format. you just shared your the protein seq file (faa), can you also share the gff file for us to look at where the problem is?

Rob-murphys commented 1 year ago

Hi @yinlabniu, here is the associated gff file 😄 sample_data_gff.zip

yinlabniu commented 1 year ago

okay, your gff file is clearly not in gff format. You may see https://www.ncbi.nlm.nih.gov/genbank/genomes_gff/ or https://www.youtube.com/watch?v=hiRXtxcTn_I for the gff format.

Rob-murphys commented 1 year ago

Opps, I forgot to request gff from prokka but thought I had. My bad!