nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
300 stars 82 forks source link

How to check the full-length genes from the final annotation. #1034

Open Nitin123-4 opened 2 months ago

Nitin123-4 commented 2 months ago

Hi,

I am interested to filter full length genes from final annotation.

Can I filter any mRNA which starts with M (ATG in CDS-transcript) and end with any of the stop codon in CDS-transcript sequences as the full length gene?

example:

GeneID TranscriptID Feature Contig Start Stop Strand Name Product Alias/Synonyms EC_number BUSCO PFAM InterPro EggNog COG GO Terms Secreted Membrane Protease CAZyme Notes gDNA mRNA CDS-transcript Translation

FUN_000001 FUN_000001-T1 mRNA chr_1 1300 2219 - hypothetical protein
ATGGCGGCCGCCGCGCCTCACTGCCCGGAAGTGACGGTTCTCCGCCGGAAGCCGTTTAGGCGACCCGCGGTGGCGTCAATGGAGTGCGCCGGGAGGAACGCGGCCCGGCCTGGTAGCGGGGATTGGAAGGCTGCTGCGCCTGCGGTCTGCGCGGCGCTCGTGAGACCCGCCCTCGCAGCCTCTCCCGTGTTTCCCCGCGTTTCCTTCGGCTTGTTGGGGGGGGGGGGGGGCGGGGGTGCGTCGTCCGACAGCTCCCGGAGGACCGGGCCTTCCCGGACCGCTATAGACGATCCACGCGTCGGGTTGGCGCTTGGTGGCCGCCGCTAAGTCTGTCTGATCGCGGCTCTCCCGTGTCGCGAATCCCCTCGGAGACTGCGCCCCGTGGTCCGTCTGGGGCAGCGCAGCCCCCCACCTTCCCCCCCCAAATCCTGAACGAGGGGCGACGAATCCCCGGGTTGCCAGCAGGCCCCAGAACCCCGAAGCTGGTCCTGCCAGCTAGGTTCCTCATCCCGGCCTCCGGGCAGCCTCCTCCCGCGGAAGACAGGAGGGGGGTGGAGGGCGGCGAGCACCCCCGAGATCAGCCTCACGTGGAGAGGCCGGAGGAGGAGAGCAGAAGGCAGTGCGGCCCGCTCCGGGGCCTCCGCCCAAGGCTGTGCCCAGCTGGCGGGTATGGTGGTGCTGAAGAGCGAGGGGGTCGAGGAGGAGGGCGCTGGCCACCATCCCACCAGCAGGCAGGCACAAGCAGCGCTGTTTCAGTTGCAGGCCTGAAGCTGCCCGAAAGGGGTCCGGATAATCTACTTCATCCGGGATCCTTTCAAGCTGCTGTGCAACCACTTTCTCAACATCCTTCTCTAAAAAGAGAAGGCCGGGTACAATACGAAAATCATCTAGTATGGTTGGGTCTACAGATAGTTCCTTGA TCAAGGAACTATCTGTAGACCCAACCATACTAGATGATTTTCGTATTGTACCCGGCCTTCTCTTTTTAGAGAAGGATGTTGAGAAAGTGGTTGCACAGCAGCTTGAAAGGATCCCGGATGAAGTAGATTATCCGGACCCCTTTCGGGCAGCTTCAGGCCTGCAACTGAAACAGCGCTGCTTGTGCCTGCCTGCTGGTGGGATGGTGGCCAGCGCCCTCCTCCTCGACCCCCTCGCTCTTCAGCACCACCATACCCGCCAGCTGGGCACAGCCTTGGGCGGAGGCCCCGGAGCGGGCCGCACTGCCTTCTGCTCTCCTCCTCCGGCCTCTCCACGTGAGGCTGATCTCGGGGGTGCTCGCCGCCCTCCACCCCCCTCCTGTCTTCCGCGGGAGGAGGCTGCCCGGAGGCCGGGATGAGGAACCTAGCTGGCAGGACCAGCTTCGGGGTTCTGGGGCCTGCTGGCAACCCGGGGATTCGTCGCCCCTCGTTCAGGATTTGGGGGGGGAAGGTGGGGGGCTGCGCTGCCCCAGACGGACCACGGGGCGCAGTGAGCGCCGCGCAGACCGCAGGCGCAGCAGCCTTCCAATCCCCGCTACCAGGCCGGGCCGCGTTCCTCCCGGCGCACTCCATTGACGCCACCGCGGGTCGCCTAAACGGCTTCCGGCGGAGAACCGTCACTTCCGGGCAGTGAGGCGCGGCGGCCGCCAT TCAAGGAACTATCTGTAGACCCAACCATACTAGATGATTTTCGTATTGTACCCGGCCTTCTCTTTTTAGAGAAGGATGTTGAGAAAGTGGTTGCACAGCAGCTTGAAAGGATCCCGGATGAAGTAGATTATCCGGACCCCTTTCGGGCAGCTTCAGGCCTGCAACTGAAACAGCGCTGCTTGTGCCTGCCTGCTGGTGGGATGGTGGCCAGCGCCCTCCTCCTCGACCCCCTCGCTCTTCAGCACCACCATACCCGCCAGCTGGGCACAGCCTTGGGCGGAGGCCCCGGAGCGGGCCGCACTGCCTTCTGCTCTCCTCCTCCGGCCTCTCCACGTGAGGCTGATCTCGGGGGTGCTCGCCGCCCTCCACCCCCCTCCTGTCTTCCGCGGGAGGAGGCTGCCCGGAGGCCGGGATGAGGAACCTAGCTGGCAGGACCAGCTTCGGGGTTCTGGGGCCTGCTGGCAACCCGGGGATTCGTCGCCCCTCGTTCAGGATTTGGGGGGGGAAGGTGGGGGGCTGCGCTGCCCCAGACGGACCACGGGGCGCAGTGAGCGCCGCGCAGACCGCAGGCGCAGCAGCCTTCCAATCCCCGCTACCAGGCCGGGCCGCGTTCCTCCCGGCGCACTCCATTGACGCCACCGCGGGTCGCCTAAACGGCTTCCGGCGGAGAACCGTCACTTCCGGGCAGTGAGGCGCGGCGGCCGCCAT MAAAAPHCPEVTVLRRKPFRRPAVASMECAGRNAARPGSGDWKAAAPAVCAALTAPRGPSGAAQPPTFPPQILNEGRRIPGLPAGPRTPKLVLPARFLIPASGQPPPAEDRRGVEGGEHPRDQPHVERPEEESRRQCGPLRGLRPRLCPAGGYGGAEERGGRGGGRWPPSHQQAGTSSAVSVAGLKLPERGPDNLLHPGSFQAAVQPLSQHPSLKREGRVQYENHLVWLGLQIVP