lh3 / miniprot

Align proteins to genomes with splicing and frameshift
https://lh3.github.io/miniprot/
MIT License
310 stars 16 forks source link

Using miniprot to identify homologous transposable elements? #52

Open BitaoQiu opened 7 months ago

BitaoQiu commented 7 months ago

It just comes to me that miniprot might be used to identify homologous TEs (based on evolutionary distant TE protein sequence) because many TEs have conserved domains (and structures) although evolved much faster than typical protein coding genes.

Because non-active (ancient) TEs can have frameshift, internal stopcodon, and insertions (from other TEs), which are similar to introns, I set -j to 0 to ignore splicing sites. However, active (more recent) TEs are usually without introns, but miniprot penalises single exons to avoid pseudogenes.

I wonder what setting should I use to avoid the single-exon penalties? And any suggestion for this task with miniprot?