lh3 / miniprot

Align proteins to genomes with splicing and frameshift
https://lh3.github.io/miniprot/
MIT License
323 stars 17 forks source link

Use information from conserved introns #60

Open mparker2 opened 6 months ago

mparker2 commented 6 months ago

Dear @lh3 ,

Thanks for developing miniprot. I have been trying it out & it is extremely useful.

I had an idea for a possible enhancement... it would be very interesting to be able to provide known intron positions (within the query protein sequences), and have a bonus score for alignments that include these. Many introns are very well conserved across species in terms of position and phase.

I'm not sure how this information would be best provided to miniprot, perhaps as a bed file or gff with protein coordinates & phase info showing how each query protein sequence is subdivided into exons.

Best wishes Matt

lh3 commented 6 months ago

Thanks. GeMoMa is doing something similar. However, it is difficult to use position-specific scoring along the protein sequence (easier along the genome sequence), and it is also difficult for users to extract the information.

mparker2 commented 6 months ago

OK, thanks - I was not aware of GeMoMa. Perhaps a script for postprocessing of miniprot alignments might achieve a similar result. I might try this.