lh3 / miniprot

Align proteins to genomes with splicing and frameshift
https://lh3.github.io/miniprot/
MIT License
310 stars 16 forks source link

Filter by alignment quality #42

Closed stubrown closed 4 months ago

stubrown commented 1 year ago

A protein to DNA alignment with translated BLAST can be filtered by e-value so the output only contains matches above a specified e-value. miniprot has an alignment score (sort of similar to Smith-Waterman) and a Mapping score (sort of similar to Bowtie). Can you add a command line parameter to filter output by a minimum quality so the gff can be used directly, for example as a JBrowse track.

lh3 commented 1 year ago

It is difficult to filter when you look at a single protein. Miniprot is intended for aligning similar proteins (say, identity >50%). The E-value is usually small. Alignment score is roughly proportional to the protein length. We can't easily set a cut off on that score. Mapping quality evaluates whether an alignment is unique. It is not that useful for protein alignment as we expect multi-gene families.