Closed yuzhenpeng closed 1 year ago
Miniprot outputs GFF3. What does EVM require?
Hello,
I have the same question. EvidenceModeler wants protein alignments in gff3. Example file can be found here https://evidencemodeler.github.io/
It would be great if miniprot can output both protein aligmnets and full-length gene structures with correct splice sites (gene, mRNA, exon, CDS, three_prime_cis_splice_site, five_prime_cis_splice_site) in two separate gff3 files. GenomeThreader can generate such files but need to run twice with different settings (-intermedia/-skipalignments) and is slow. Protein alignments can be integrated with other evidence by EvidenceModeler and gene structures can be used as training set for predictors like AUGUSTUS.
Many thanks!
Sincerely,
Cong
The evm example doesn't have the three_prime_cis_splice_site or the five_prime_cis_splice_site features. The only difference from the miniprot gff seems the gene and exon features. Could you write a script to add these two features? If you can confirm evm requires those two features, I can add them.
Hello,
The protein alignment gff only has "match" feature and evm requires ID and Target in 9th field. It looks like this:
Contig1 nap-nr_minus_rice.fasta nucleotide_to_protein_match 8392 8470 50.00 - . ID=match.nap.nr_minus_rice.fasta.37;Target=RF|YP_440341.1|83716234|NC_007650 196 222
Contig1 nap-nr_minus_rice.fasta nucleotide_to_protein_match 7650 7786 26.09 - . ID=match.nap.nr_minus_rice.fasta.37;Target=RF|YP_440341.1|83716234|NC_007650 222 268
Contig1 nap-nr_minus_rice.fasta nucleotide_to_protein_match 8386 8509 26.83 - . ID=match.nap.nr_minus_rice.fasta.38;Target=RF|YP_099363.1|53713371|NC_006347 1 42
Contig1 nap-nr_minus_rice.fasta nucleotide_to_protein_match 7635 7786 24.00 - . ID=match.nap.nr_minus_rice.fasta.38;Target=RF|YP_099363.1|53713371|NC_006347 42 92
It is the second evm example for alignments instead of the first one for gene prediction gff with gene/exon features.
So it would be great if miniprot can output two gff files: one contains alignments and looks like what I pasted above, and another one contains gene structures with gene, exon, mRNA, exon, CDS features.
Many thanks!
Sincerely,
Cong
You can write a script to generate the second gff. This is highly evm specific and seems redundant. I don't add features for just one tool.
Hello, @lh3
Thank you for developing and maintaining miniprot.
I created miniprot projections for one of my newly sequenced assemblies, which I wish to use in EVM for genome annotation. So I was wondering if you would be willing to share a script to convert miniprot output into a GFF3 file that EVM accepts? I thought I might ask if you already have a procedure developed for it.
Thanks in advance, Zhenpeng