Open larsmoret opened 10 months ago
Hi @larsmoret Could you list the files (also the filesize) under the directory "fungi_odb10" of output folder?
(checker) 130 lmoret@ubuntudesktopc:~/data/volume_2$ ls compleasmoutput/CBS1922/fungi_odb10/ hmmer_output hmmsearch.done miniprot.done miniprot_output.gff translated_protein.fasta
Total file size is: 25M compleasmoutput/CBS1922/fungi_odb10
with per file: 1.5M compleasmoutput/CBS1922/fungi_odb10/hmmer_output/ 0 compleasmoutput/CBS1922/fungi_odb10/hmmsearch.done 0 compleasmoutput/CBS1922/fungi_odb10/miniprot.done 24M compleasmoutput/CBS1922/fungi_odb10/miniprot_output.gff 176K compleasmoutput/CBS1922/fungi_odb10/translated_protein.fasta
**Hello,
I am running into the same issue as @larsmoret. Attached is my submission script.** SCRIPT_miniBUSCO_20231106_v1.txt
Here are the contents of the "arthropoda_odb10" directory:
-rw-r--r-- 1 kcd88651 tcglab 9676547 Nov 4 17:49 miniprot_output.gff -rw-r--r-- 1 kcd88651 tcglab 0 Nov 4 17:49 miniprot.done -rw-r--r-- 1 kcd88651 tcglab 0 Nov 4 17:49 hmmsearch.done drwxr-xr-x 2 kcd88651 tcglab 4096 Nov 4 17:49 hmmer_output -rw-r--r-- 1 kcd88651 tcglab 0 Nov 6 11:48 translated_protein.fasta
This is my error output:
Traceback (most recent call last): File "/home/kcd88651/.conda/envs/compleasm/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Target_species'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/kcd88651/.conda/envs/compleasm/bin/compleasm", line 10, in
Hi @katiecdillon
Thanks for providing the script. Could you specify a different output folder name for each input assembly, instead of using "$D2" for all the assemblies?
Hi @larsmoret @katiecdillon ,
I have added some checks in the code to understand why something went wrong. The reason for KeyError "Target_species" is that there is no candidate alignment hits satisfying the BUSCO threshold. Could you clone the source code and re-run the failed case in the existing compleasm env?
e.g.
https://github.com/huangnengCSU/compleasm.git
python compleasm.py run -a $input_asm -l $lineage -o $output_folder -t $threads
Thanks!
Hi @huangnengCSU
Ive tried it, and now it loads the fungi_obd10 but it can not build the index.
Thanks in advance,
(checker) 2 lmoret@ubuntudesktopc:~/data/volume_2/compleasm$ compleasm run -a ~/finalassemblies/CBS1922.fasta -l fungi -o ~/compleasmoutput/ -t 14
Searching for miniprot in the path where compleasm.py is located
Searching for miniprot in the current execution path
Searching for miniprot in $PATH
Searching for hmmsearch in the path where compleasm.py is located
Searching for hmmsearch in the current execution path
Searching for hmmsearch in $PATH
miniprot execute command:
/home/lmoret/miniconda3/envs/checker/bin/miniprot
Success download from https://busco-data.ezlab.org/v5/data/file_versions.tsv
Success download from https://busco-data.ezlab.org/v5/data/placement_files/list_of_reference_markers.eukaryota_odb10.2019-12-16.txt.tar.gz
Placement file extraction path: mb_downloads/placement_files/list_of_reference_markers.eukaryota_odb10.2019-12-16.txt
Success download from https://busco-data.ezlab.org/v5/data/placement_files/mapping_taxid-lineage.eukaryota_odb10.2019-12-16.txt.tar.gz
Placement file extraction path: mb_downloads/placement_files/mapping_taxid-lineage.eukaryota_odb10.2019-12-16.txt
Success download from https://busco-data.ezlab.org/v5/data/placement_files/mapping_taxids-busco_dataset_name.eukaryota_odb10.2019-12-16.txt.tar.gz
Placement file extraction path: mb_downloads/placement_files/mapping_taxids-busco_dataset_name.eukaryota_odb10.2019-12-16.txt
Success download from https://busco-data.ezlab.org/v5/data/placement_files/supermatrix.aln.eukaryota_odb10.2019-12-16.faa.tar.gz
Placement file extraction path: mb_downloads/placement_files/supermatrix.aln.eukaryota_odb10.2019-12-16.faa
Success download from https://busco-data.ezlab.org/v5/data/placement_files/tree.eukaryota_odb10.2019-12-16.nwk.tar.gz
Placement file extraction path: mb_downloads/placement_files/tree.eukaryota_odb10.2019-12-16.nwk
Success download from https://busco-data.ezlab.org/v5/data/placement_files/tree_metadata.eukaryota_odb10.2019-12-16.txt.tar.gz
Placement file extraction path: mb_downloads/placement_files/tree_metadata.eukaryota_odb10.2019-12-16.txt
Success download from https://busco-data.ezlab.org/v5/data/lineages/eukaryota_odb10.2020-09-10.tar.gz
Lineage file extraction path: mb_downloads/eukaryota_odb10
Success download from https://busco-data.ezlab.org/v5/data/lineages/fungi_odb10.2021-06-28.tar.gz
Lineage file extraction path: mb_downloads/fungi_odb10
lineage: fungi_odb10
[ERROR] failed to open/build the index
Traceback (most recent call last):
File "/home/lmoret/miniconda3/envs/checker/bin/compleasm", line 10, in
To @larsmoret
The error "failed to open/build the index" is reported in miniprot. You can test the alignment manually by "miniprot --trans -u -I --outs=0.95 -t 20 --gff ~/finalassemblies/CBS1922.fasta mb_downloads/fungi_odb10/refseq_db.faa.gz > out.gff". I guess the problem occurs in creating the index of genome.
Hello @huangnengCSU it looks like the output directory was in fact the issue. Thank you!
Hi @huangnengCSU, I've tried it again and manually downloaded the dependencies again, however I'm still facing difficulties. The most interesting part fo the log is stated below, does it maybe have to do with the quality of the assembly?
Kind regards, Lars Moret
[M::main] CMD: /data/volume_2/compleasm_kit/miniprot --trans -u -I --outs=0.95 -t 14 --gff finalassemblies/CBS.fasta mb_downloads/eukaryota_odb10/refseq_db.faa.gz [M::main] Real time: 72.284 sec; CPU: 957.367 sec; Peak RSS: 0.219 GB hmmsearch execute command: /data/volume_2/compleasm_kit/hmmsearch Warning: no reliable mappings found. All candidates do not pass the cutoff of BUSCO gene. Warning: No reliable hits found! Check the lineage file: eukaryota_odb10, alignment file: compleasmoutput/CBS/eukaryota_odb10/miniprot_output.gff, hmmsearch output folder: compleasmoutput/CBS/eukaryota_odb10/hmmer_output.
S:0.00%, 0 D:0.00%, 0 F:0.00%, 0 I:0.00%, 0 M:100.00%, 255 N:255
Hi @larsmoret,
All BUSCO genes are missing is because that there is no gene can be aligned to the assembly and pass the BUSCO's threshold, which means the genes are quite different from the assembly result. It may be the quality of assembly result or choosing the wrong lineage file. Meanwhile, if the assembly with high divergence, miniprot may not align well. Did you try BUSCO and how about the assessment result of BUSCO?
Dear all, I must say, I am quite intrigued comparing it to BUSCO
However, I came across an error while trying to run it and i have no idea where to look. While trying to run Compleasm, it suddenly stops and displays KeyError: 'Target_Species'
Has anyone had the same issue or any idea where the problem might be?
Thanks in advance, Lars Moret
P.S. This is my entire log, please note that i have installed Compleasm using conda.
(checker) lmoret@ubuntudesktopc:/data/volume_2$ compleasm run -a finalassemblies/CBS1922.fasta -o compleasmoutput/CBS1922 -l fungi -t 14 Searching for miniprot in the path where compleasm.py is located Searching for miniprot in the current execution path Searching for hmmsearch in the path where compleasm.py is located Searching for hmmsearch in the current execution path miniprot execute command: /data/volume_2/compleasm_kit/miniprot lineage: fungi_odb10 hmmsearch execute command: /data/volume_2/compleasm_kit/hmmsearch Traceback (most recent call last): File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Target_species'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/lmoret/miniconda3/envs/checker/bin/compleasm", line 10, in
sys.exit(main())
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 2534, in main
args.func(args)
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 2426, in run
mr.Run()
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 2142, in Run
miniprot_alignment_parser.Run()
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 1158, in Run
self.Run_busco_mode()
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 1234, in Run_busco_mode
filtered_species = records_df["Target_species"].unique()
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in getitem
indexer = self.columns.get_loc(key)
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'Target_species'
(checker) 1 lmoret@ubuntudesktopc:/data/volume_2$