lh3 / miniprot

Align proteins to genomes with splicing and frameshift
https://lh3.github.io/miniprot/
MIT License
310 stars 16 forks source link

An error when looking for 200 proteins in a genome #46

Open gilloe opened 1 year ago

gilloe commented 1 year ago

Hi, I get the following error when running miniprot looking for 200 proteins at a 73MB genome: [M::mp_ntseq_read@0.3751.01] read 68163509 bases in 69053 contigs [M::mp_idx_build@0.3791.01] 608218 blocks [M::mp_idx_build@0.9872.83] collected syncmers [M::mp_idx_build@13.9181.13] 23643989 kmer-block pairs [M::mp_idx_print_stat] 1694743 distinct k-mers; mean occ of infrequent k-mers: 13.95; 0 frequent k-mers accounting for 0 occurrences

gff-version 3

BUG! 1929 == 1929? 621 == 622? 40M1I57M3D83M2D98M3I8M1I59M1D26M15U117M1D46M2D30M5I27M1I5M63U12M miniprot_: align.c:195: mp_extra_cal: Assertion `al == r->qe - r->qs' failed. Aborted

I tried to run it looking for 20 proteins in the same genome and it worked fine, and also looked for the 200 proteins in a smaller genome, and it works fine. So I don't think it is a file format issue. Any suggestions?

Thanks, Gil

lh3 commented 1 year ago

Could you share me with the proteins and the reference genome?

gilloe commented 1 year ago

Yes, thanks. The genome can be downloaded from: https://www.ncbi.nlm.nih.gov/nuccore/LDNA00000000.1 And the query file is attached. Hydra_uniprot (2).zip

lh3 commented 1 year ago

I downloaded the genome and could get the results:

[M::mp_ntseq_read@0.385*1.00] read 68163509 bases in 69053 contigs
[M::mp_idx_build@0.386*1.00] 608218 blocks
[M::mp_idx_build@0.794*2.52] collected syncmers
[M::mp_idx_build@1.099*2.09] 23643989 kmer-block pairs
[M::mp_idx_print_stat] 1694743 distinct k-mers; mean occ of infrequent k-mers: 13.95; 0 frequent k-mers accounting for 0 occurrences
[M::worker_pipeline::1.299*2.27] mapped 200 sequences
[M::main] Version: 0.11-r234
[M::main] CMD: ./miniprot --gff GCA_001455295.2_ASM145529v2_genomic.fna.gz Hydra_uniprot.fasta

Based on the total number of bases, we are using the same reference.

What version are you using?

gilloe commented 1 year ago

How do I know which version I have?

lh3 commented 1 year ago

miniprot --version. You may just try the latest version.

gilloe commented 1 year ago

0.11-r235-dirty I downloaded it very recently.