huangnengCSU / compleasm

A genome completeness evaluation tool based on miniprot
Apache License 2.0
172 stars 16 forks source link

Different stats for protein mode #18

Open arslan9732 opened 8 months ago

arslan9732 commented 8 months ago

Hi, I ran compleasm and BUSCO using protein mode on my annotated genome. Compleasm's result is not good. Is there a change in your hmmsearch or something else? Here are the results:

Compleasm:

S:74.25%,1727
D:20.42%, 475
F:1.98%, 46
M:3.35%, 78
N: 2326

BUSCO

 C:95.3%[S:84.3%,D:11.0%],F:1.2%,M:3.5%,n:2326
        2216    Complete BUSCOs (C)
        1961    Complete and single-copy BUSCOs (S)
        255     Complete and duplicated BUSCOs (D)
        29      Fragmented BUSCOs (F)
        81      Missing BUSCOs (M)
        2326    Total BUSCO groups searched
huangnengCSU commented 8 months ago

Hi @arslan9732, It's interesting. Could you share the protein file and also which lineage do you use? So I can figure out what happened. Thanks!

arslan9732 commented 8 months ago

Hi @huangnengCSU, Sorry, I can't share my file here. But I also tried with a public data set and showed the same behavior. I used Arabidopsis Thalaina protein file https://ftp.ebi.ac.uk/ensemblgenomes/pub/release-57/plants/fasta/arabidopsis_thaliana/pep/Arabidopsis_thaliana.TAIR10.pep.all.fa.gz The results are: Compleasm

./compleasm.py protein -p ATH.faa -l eudicots -t 50 -o ATH-comp -L /mnt/bin/minibusco/mb_downloads

S: 51.42%,1196
D: 47.98%,1116
F: 0.34%,8
M: 0.26%,6
N: 2326

BUSCO

busco -i ATH.faa -l eudicots_odb10 --download_path /mnt/data/arslan/tool/busco_download/ -o ATH-busco -m protein -c 50 -f
        --------------------------------------------------
        |Results from dataset eudicots_odb10              |
        --------------------------------------------------
        |C:99.8%[S:59.5%,D:40.3%],F:0.0%,M:0.2%,n:2326    |
        |2320   Complete BUSCOs (C)                       |
        |1383   Complete and single-copy BUSCOs (S)       |
        |937    Complete and duplicated BUSCOs (D)        |
        |0      Fragmented BUSCOs (F)                     |
        |6      Missing BUSCOs (M)                        |
        |2326   Total BUSCO groups searched               |
        --------------------------------------------------