lh3 / miniprot

Align proteins to genomes with splicing and frameshift
https://lh3.github.io/miniprot/
MIT License
312 stars 17 forks source link

Mapq #7

Open jelber2 opened 1 year ago

jelber2 commented 1 year ago

For proteins mapping to multiple contigs/chromosomes, how might one deduce the equivalent of mapping quality with miniprot? My guess is one could have a go at AS and as scores (although I am seeing ms in the resulting PAF files?)

+----+------+---------------------------------------------------+
|Tag | Type |                    Description                    |
+----+------+---------------------------------------------------+
| AS |  i   | Alignment score from dynamic programming          |
| as |  i   | Alignment score excluding introns                 |
| np |  i   | Number of amino acid matches with positive scores |
| da |  i   | Distance to the nearest start codon               |
| do |  i   | Distance to the nearest stop codon                |
| cg |  i   | Protein CIGAR                                     |
| cs |  i   | Difference string                                 |
+----+------+---------------------------------------------------+
lh3 commented 1 year ago

I will add mapping quality in future. Miniprot doesn't have it now because mapping quality is not very important for cross-species alignment.

The as in the manpage has been renamed to ms. It is roughly equivalent to the ms tag reported by minimap2. Please use this tag to estimate mapping uniqueness. AS sometimes favors pseudogenes.

jelber2 commented 1 year ago

Thank you!

lh3 commented 1 year ago

I will keep this issue open as a reminder to myself. BTW, I have just updated the manpage to replace "as" with "ms".

conchoecia commented 1 year ago

Just wanted to join in to say MAPQ would be a very nice addition. For example I am working with sponges, and have ~50 sponge transcriptomes that I am mapping to a new species that I am trying to annotate. For each locus in the genome it would be nice to be able to filter out poor matches based on MAPQ in the PAF line. Thanks for writing this nice piece of software, @lh3, I had been using a tblastn pipeline to perform a similar function before this.

lh3 commented 1 year ago

MAPQ won't be very useful for filtering poor matches. You should look at score, identity and positive.