lh3 / miniprot

Align proteins to genomes with splicing and frameshift
https://lh3.github.io/miniprot/
MIT License
310 stars 16 forks source link

miniprot do not include stop_codon in gtf file? #43

Closed xiekunwhy closed 1 year ago

xiekunwhy commented 1 year ago

Hi,

I found that just a few transcripts with final stop_codon in gtf when giving --gtf option.

grep -w transcript par.mp.raw.gtf|wc -l
127776

## keep transcript with final stop_codon but without in-frame stop_codon
gffread -g par.fa -J -T par.mp.raw.gtf -o par.mp.raw.J.gtf
grep -w transcript par.mp.raw.J.gtf|wc -l
146

## keep transcript without in-frame stop_codon
gffread -g par.fa -V -T par.mp.raw.gtf -o par.mp.raw.V.gtf
grep -w transcript par.mp.raw.V.gtf|wc -l
104905

as we can see, many transcript do not include final stop_codon.

Best, Kun

xiekunwhy commented 1 year ago

I solved this problem by writing a script to covert gff to gtf.