huangnengCSU / compleasm

A genome completeness evaluation tool based on miniprot
Apache License 2.0
178 stars 18 forks source link

KeyError: 'Target_species' #7

Closed N0s3n closed 1 year ago

N0s3n commented 1 year ago

Hi,

I've been trying to install compleasm but when I try to analyze the example output from miniprot I get an error. I've tried both the manual install and the docker/singularity version.

I guess the gff file from the example in miniprot isn't compatable. Do you have some test data that I can run?

python compleasm.py analyze -g aln.gff -o output_dir -l eukaryota 

Searching for hmmsearch in the path where compleasm.py is located
Searching for hmmsearch in the current execution path
Searching for hmmsearch in $PATH
hmmsearch execute command:
 /sw/bioinfo/hmmer/3.3.2/rackham/bin/hmmsearch
Traceback (most recent call last):
  File "/sw/comp/python3/3.7.2_rackham/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Target_species'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "compleasm.py", line 2539, in <module>
    main()
  File "compleasm.py", line 2535, in main
    args.func(args)
  File "compleasm.py", line 2377, in analyze
    ar.Run()
  File "compleasm.py", line 1158, in Run
    self.Run_busco_mode()
  File "compleasm.py", line 1235, in Run_busco_mode
    filtered_species = records_df["Target_species"].unique()
  File "/sw/comp/python3/3.7.2_rackham/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/sw/comp/python3/3.7.2_rackham/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'Target_species'

Regards, Björn

huangnengCSU commented 1 year ago

Hi @N0s3n For using the gff file as input, we need to run miniprot like miniprot --trans -u -I --outs=0.95 --gff -t 8 ref-file protein.faa > output.gff. Here, the key option is --trans to output translated protein sequences in the gff file.

Neng.

N0s3n commented 1 year ago

Thank you for your reply. But I still get the same KeyError. I really would like a test .gff file that works. It would be great if you can either post one here or add a test directory the repo so you can confirm that compleasm got installed correctly.

Here's my run with the --trans option.

python compleasm.py analyze  -o DPP3 -l eukaryota -g <(miniprot --trans -u -I --outs=0.95 --gff /sw/bioinfo/miniprot/0.12/src/test/DPP3-hs.gen.fa /sw/bioinfo/miniprot/0.12/src/test/DPP3-mm.pep.fa)

[M::mp_ntseq_read@0.001*2.07] read 27033 bases in 1 contigs
[M::mp_idx_build@0.001*2.01] 212 blocks
[M::mp_idx_build@0.002*2.15] collected syncmers
[M::mp_idx_build@0.092*1.01] 16125 kmer-block pairs
[M::mp_mapopt_set_max_intron] set max intron size to 10000
[M::mp_idx_print_stat] 14397 distinct k-mers; mean occ of infrequent k-mers: 1.12; 0 frequent k-mers accounting for 0 occurrences
[M::worker_pipeline::0.113*1.01] mapped 1 sequences
[M::main] Version: 0.12-r237
[M::main] CMD: miniprot --trans -u -I --outs=0.95 --gff /sw/bioinfo/miniprot/0.12/src/test/DPP3-hs.gen.fa /sw/bioinfo/miniprot/0.12/src/test/DPP3-mm.pep.fa
[M::main] Real time: 0.121 sec; CPU: 0.121 sec; Peak RSS: 0.064 GB
Searching for hmmsearch in the path where compleasm.py is located
Searching for hmmsearch in the current execution path
Searching for hmmsearch in $PATH
hmmsearch execute command:
 /sw/bioinfo/hmmer/3.3.2/rackham/bin/hmmsearch
Traceback (most recent call last):
  File "/sw/comp/python3/3.7.2_rackham/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Target_species'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "compleasm.py", line 2539, in <module>
    main()
  File "compleasm.py", line 2535, in main
    args.func(args)
  File "compleasm.py", line 2377, in analyze
    ar.Run()
  File "compleasm.py", line 1158, in Run
    self.Run_busco_mode()
  File "compleasm.py", line 1235, in Run_busco_mode
    filtered_species = records_df["Target_species"].unique()
  File "/sw/comp/python3/3.7.2_rackham/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/sw/comp/python3/3.7.2_rackham/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'Target_species'

/Björn

huangnengCSU commented 1 year ago

@N0s3n Here is the test data for S. cerevisiae reference genome including the .gff file and some commands. demo.zip

N0s3n commented 1 year ago

Thank you! It works now.

/Björn