lh3 / miniprot

Align proteins to genomes with splicing and frameshift
https://lh3.github.io/miniprot/
MIT License
310 stars 16 forks source link

How to get exon and UTR cordinates like GenomeThreader? #27

Closed arslan9732 closed 1 year ago

arslan9732 commented 1 year ago

Hi, Is it possible to get output from miniprot like Genomethreader? As miniprot only gives the mRNA, CDS, and stop codon coordinates in the output and GenomeThreader gives the information of gene, mRNA, exon, CDS, three_prime_cis_splice_site, and five_prime_cis_splice_site coordinates.

GenomeThreader demo output:

NC_003070.9     gth     gene    7318    8666    .       -       .       ID=gene1
NC_003070.9     gth     mRNA    7318    8666    .       -       .       ID=mRNA1;Parent=gene1;Target=NP_001030923.1 1 191 +
NC_003070.9     gth     exon    7318    7450    1       -       .       Parent=mRNA1
NC_003070.9     gth     CDS     7318    7450    .       -       1       ID=CDS1;Parent=mRNA1
NC_003070.9     gth     three_prime_cis_splice_site     7450    7451    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     five_prime_cis_splice_site      7563    7564    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     exon    7564    7649    1       -       .       Parent=mRNA1
NC_003070.9     gth     CDS     7564    7649    .       -       0       ID=CDS1;Parent=mRNA1
NC_003070.9     gth     three_prime_cis_splice_site     7649    7650    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     five_prime_cis_splice_site      7761    7762    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     exon    7762    7835    1       -       .       Parent=mRNA1
NC_003070.9     gth     CDS     7762    7835    .       -       2       ID=CDS1;Parent=mRNA1
NC_003070.9     gth     three_prime_cis_splice_site     7835    7836    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     five_prime_cis_splice_site      7941    7942    0.99    -       .       Parent=mRNA1
NC_003070.9     gth     exon    7942    7987    1       -       .       Parent=mRNA1
NC_003070.9     gth     CDS     7942    7987    .       -       0       ID=CDS1;Parent=mRNA1
NC_003070.9     gth     three_prime_cis_splice_site     7987    7988    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     five_prime_cis_splice_site      8235    8236    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     exon    8236    8325    1       -       .       Parent=mRNA1
NC_003070.9     gth     CDS     8236    8325    .       -       0       ID=CDS1;Parent=mRNA1
NC_003070.9     gth     three_prime_cis_splice_site     8325    8326    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     five_prime_cis_splice_site      8416    8417    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     exon    8417    8464    1       -       .       Parent=mRNA1
NC_003070.9     gth     CDS     8417    8464    .       -       0       ID=CDS1;Parent=mRNA1
NC_003070.9     gth     three_prime_cis_splice_site     8464    8465    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     five_prime_cis_splice_site      8570    8571    0.05    -       .       Parent=mRNA1
NC_003070.9     gth     exon    8571    8666    1       -       .       Parent=mRNA1
NC_003070.9     gth     CDS     8571    8666    .       -       0       ID=CDS1;Parent=mRNA1
###
###
NC_003070.9     gth     gene    33995   37061   .       -       .       ID=gene2
NC_003070.9     gth     mRNA    33995   37061   .       -       .       ID=mRNA2;Parent=gene2;Target=NP_001030924.1 1 645 +
NC_003070.9     gth     exon    33995   34327   1       -       .       Parent=mRNA2
NC_003070.9     gth     CDS     33995   34327   .       -       0       ID=CDS2;Parent=mRNA2
NC_003070.9     gth     three_prime_cis_splice_site     34327   34328   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     five_prime_cis_splice_site      34400   34401   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     exon    34401   35474   1       -       .       Parent=mRNA2
NC_003070.9     gth     CDS     34401   35474   .       -       0       ID=CDS2;Parent=mRNA2
NC_003070.9     gth     three_prime_cis_splice_site     35474   35475   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     five_prime_cis_splice_site      35566   35567   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     exon    35567   35647   1       -       .       Parent=mRNA2
NC_003070.9     gth     CDS     35567   35647   .       -       0       ID=CDS2;Parent=mRNA2
NC_003070.9     gth     three_prime_cis_splice_site     35647   35648   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     five_prime_cis_splice_site      35729   35730   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     exon    35730   35963   1       -       .       Parent=mRNA2
NC_003070.9     gth     CDS     35730   35963   .       -       0       ID=CDS2;Parent=mRNA2
NC_003070.9     gth     three_prime_cis_splice_site     35963   35964   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     five_prime_cis_splice_site      36623   36624   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     exon    36624   36685   1       -       .       Parent=mRNA2
NC_003070.9     gth     CDS     36624   36685   .       -       2       ID=CDS2;Parent=mRNA2
NC_003070.9     gth     three_prime_cis_splice_site     36685   36686   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     five_prime_cis_splice_site      36809   36810   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     exon    36810   36921   1       -       .       Parent=mRNA2
NC_003070.9     gth     CDS     36810   36921   .       -       0       ID=CDS2;Parent=mRNA2
NC_003070.9     gth     three_prime_cis_splice_site     36921   36922   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     five_prime_cis_splice_site      37022   37023   0.05    -       .       Parent=mRNA2
NC_003070.9     gth     exon    37023   37061   1       -       .       Parent=mRNA2
NC_003070.9     gth     CDS     37023   37061   .       -       0       ID=CDS2;Parent=mRNA2
lh3 commented 1 year ago

In your example, GenomeThreader does not seem to report UTRs, either.

arslan9732 commented 1 year ago

yes, you are right. I change the title of the question.

lh3 commented 1 year ago

exons are the same as CDS. You can write a script to add them by yourself.