Closed N0s3n closed 1 year ago
Hi @N0s3n
For using the gff file as input, we need to run miniprot like miniprot --trans -u -I --outs=0.95 --gff -t 8 ref-file protein.faa > output.gff
. Here, the key option is --trans
to output translated protein sequences in the gff file.
Neng.
Thank you for your reply. But I still get the same KeyError. I really would like a test .gff file that works. It would be great if you can either post one here or add a test directory the repo so you can confirm that compleasm got installed correctly.
Here's my run with the --trans
option.
python compleasm.py analyze -o DPP3 -l eukaryota -g <(miniprot --trans -u -I --outs=0.95 --gff /sw/bioinfo/miniprot/0.12/src/test/DPP3-hs.gen.fa /sw/bioinfo/miniprot/0.12/src/test/DPP3-mm.pep.fa)
[M::mp_ntseq_read@0.001*2.07] read 27033 bases in 1 contigs
[M::mp_idx_build@0.001*2.01] 212 blocks
[M::mp_idx_build@0.002*2.15] collected syncmers
[M::mp_idx_build@0.092*1.01] 16125 kmer-block pairs
[M::mp_mapopt_set_max_intron] set max intron size to 10000
[M::mp_idx_print_stat] 14397 distinct k-mers; mean occ of infrequent k-mers: 1.12; 0 frequent k-mers accounting for 0 occurrences
[M::worker_pipeline::0.113*1.01] mapped 1 sequences
[M::main] Version: 0.12-r237
[M::main] CMD: miniprot --trans -u -I --outs=0.95 --gff /sw/bioinfo/miniprot/0.12/src/test/DPP3-hs.gen.fa /sw/bioinfo/miniprot/0.12/src/test/DPP3-mm.pep.fa
[M::main] Real time: 0.121 sec; CPU: 0.121 sec; Peak RSS: 0.064 GB
Searching for hmmsearch in the path where compleasm.py is located
Searching for hmmsearch in the current execution path
Searching for hmmsearch in $PATH
hmmsearch execute command:
/sw/bioinfo/hmmer/3.3.2/rackham/bin/hmmsearch
Traceback (most recent call last):
File "/sw/comp/python3/3.7.2_rackham/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Target_species'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "compleasm.py", line 2539, in <module>
main()
File "compleasm.py", line 2535, in main
args.func(args)
File "compleasm.py", line 2377, in analyze
ar.Run()
File "compleasm.py", line 1158, in Run
self.Run_busco_mode()
File "compleasm.py", line 1235, in Run_busco_mode
filtered_species = records_df["Target_species"].unique()
File "/sw/comp/python3/3.7.2_rackham/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in __getitem__
indexer = self.columns.get_loc(key)
File "/sw/comp/python3/3.7.2_rackham/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'Target_species'
/Björn
@N0s3n
Here is the test data for S. cerevisiae reference genome including the .gff
file and some commands.
demo.zip
Thank you! It works now.
/Björn
Hi,
I've been trying to install compleasm but when I try to analyze the example output from miniprot I get an error. I've tried both the manual install and the docker/singularity version.
I guess the gff file from the example in miniprot isn't compatable. Do you have some test data that I can run?
Regards, Björn