Open Nitin123-4 opened 2 months ago
Hi,
I am interested to filter full length genes from final annotation.
Can I filter any mRNA which starts with M (ATG in CDS-transcript) and end with any of the stop codon in CDS-transcript sequences as the full length gene?
example:
GeneID TranscriptID Feature Contig Start Stop Strand Name Product Alias/Synonyms EC_number BUSCO PFAM InterPro EggNog COG GO Terms Secreted Membrane Protease CAZyme Notes gDNA mRNA CDS-transcript Translation
FUN_000001 FUN_000001-T1 mRNA chr_1 1300 2219 - hypothetical protein ATGGCGGCCGCCGCGCCTCACTGCCCGGAAGTGACGGTTCTCCGCCGGAAGCCGTTTAGGCGACCCGCGGTGGCGTCAATGGAGTGCGCCGGGAGGAACGCGGCCCGGCCTGGTAGCGGGGATTGGAAGGCTGCTGCGCCTGCGGTCTGCGCGGCGCTCGTGAGACCCGCCCTCGCAGCCTCTCCCGTGTTTCCCCGCGTTTCCTTCGGCTTGTTGGGGGGGGGGGGGGGCGGGGGTGCGTCGTCCGACAGCTCCCGGAGGACCGGGCCTTCCCGGACCGCTATAGACGATCCACGCGTCGGGTTGGCGCTTGGTGGCCGCCGCTAAGTCTGTCTGATCGCGGCTCTCCCGTGTCGCGAATCCCCTCGGAGACTGCGCCCCGTGGTCCGTCTGGGGCAGCGCAGCCCCCCACCTTCCCCCCCCAAATCCTGAACGAGGGGCGACGAATCCCCGGGTTGCCAGCAGGCCCCAGAACCCCGAAGCTGGTCCTGCCAGCTAGGTTCCTCATCCCGGCCTCCGGGCAGCCTCCTCCCGCGGAAGACAGGAGGGGGGTGGAGGGCGGCGAGCACCCCCGAGATCAGCCTCACGTGGAGAGGCCGGAGGAGGAGAGCAGAAGGCAGTGCGGCCCGCTCCGGGGCCTCCGCCCAAGGCTGTGCCCAGCTGGCGGGTATGGTGGTGCTGAAGAGCGAGGGGGTCGAGGAGGAGGGCGCTGGCCACCATCCCACCAGCAGGCAGGCACAAGCAGCGCTGTTTCAGTTGCAGGCCTGAAGCTGCCCGAAAGGGGTCCGGATAATCTACTTCATCCGGGATCCTTTCAAGCTGCTGTGCAACCACTTTCTCAACATCCTTCTCTAAAAAGAGAAGGCCGGGTACAATACGAAAATCATCTAGTATGGTTGGGTCTACAGATAGTTCCTTGA TCAAGGAACTATCTGTAGACCCAACCATACTAGATGATTTTCGTATTGTACCCGGCCTTCTCTTTTTAGAGAAGGATGTTGAGAAAGTGGTTGCACAGCAGCTTGAAAGGATCCCGGATGAAGTAGATTATCCGGACCCCTTTCGGGCAGCTTCAGGCCTGCAACTGAAACAGCGCTGCTTGTGCCTGCCTGCTGGTGGGATGGTGGCCAGCGCCCTCCTCCTCGACCCCCTCGCTCTTCAGCACCACCATACCCGCCAGCTGGGCACAGCCTTGGGCGGAGGCCCCGGAGCGGGCCGCACTGCCTTCTGCTCTCCTCCTCCGGCCTCTCCACGTGAGGCTGATCTCGGGGGTGCTCGCCGCCCTCCACCCCCCTCCTGTCTTCCGCGGGAGGAGGCTGCCCGGAGGCCGGGATGAGGAACCTAGCTGGCAGGACCAGCTTCGGGGTTCTGGGGCCTGCTGGCAACCCGGGGATTCGTCGCCCCTCGTTCAGGATTTGGGGGGGGAAGGTGGGGGGCTGCGCTGCCCCAGACGGACCACGGGGCGCAGTGAGCGCCGCGCAGACCGCAGGCGCAGCAGCCTTCCAATCCCCGCTACCAGGCCGGGCCGCGTTCCTCCCGGCGCACTCCATTGACGCCACCGCGGGTCGCCTAAACGGCTTCCGGCGGAGAACCGTCACTTCCGGGCAGTGAGGCGCGGCGGCCGCCAT TCAAGGAACTATCTGTAGACCCAACCATACTAGATGATTTTCGTATTGTACCCGGCCTTCTCTTTTTAGAGAAGGATGTTGAGAAAGTGGTTGCACAGCAGCTTGAAAGGATCCCGGATGAAGTAGATTATCCGGACCCCTTTCGGGCAGCTTCAGGCCTGCAACTGAAACAGCGCTGCTTGTGCCTGCCTGCTGGTGGGATGGTGGCCAGCGCCCTCCTCCTCGACCCCCTCGCTCTTCAGCACCACCATACCCGCCAGCTGGGCACAGCCTTGGGCGGAGGCCCCGGAGCGGGCCGCACTGCCTTCTGCTCTCCTCCTCCGGCCTCTCCACGTGAGGCTGATCTCGGGGGTGCTCGCCGCCCTCCACCCCCCTCCTGTCTTCCGCGGGAGGAGGCTGCCCGGAGGCCGGGATGAGGAACCTAGCTGGCAGGACCAGCTTCGGGGTTCTGGGGCCTGCTGGCAACCCGGGGATTCGTCGCCCCTCGTTCAGGATTTGGGGGGGGAAGGTGGGGGGCTGCGCTGCCCCAGACGGACCACGGGGCGCAGTGAGCGCCGCGCAGACCGCAGGCGCAGCAGCCTTCCAATCCCCGCTACCAGGCCGGGCCGCGTTCCTCCCGGCGCACTCCATTGACGCCACCGCGGGTCGCCTAAACGGCTTCCGGCGGAGAACCGTCACTTCCGGGCAGTGAGGCGCGGCGGCCGCCAT MAAAAPHCPEVTVLRRKPFRRPAVASMECAGRNAARPGSGDWKAAAPAVCAALTAPRGPSGAAQPPTFPPQILNEGRRIPGLPAGPRTPKLVLPARFLIPASGQPPPAEDRRGVEGGEHPRDQPHVERPEEESRRQCGPLRGLRPRLCPAGGYGGAEERGGRGGGRWPPSHQQAGTSSAVSVAGLKLPERGPDNLLHPGSFQAAVQPLSQHPSLKREGRVQYENHLVWLGLQIVP
Hi,
I am interested to filter full length genes from final annotation.
Can I filter any mRNA which starts with M (ATG in CDS-transcript) and end with any of the stop codon in CDS-transcript sequences as the full length gene?
example:
GeneID TranscriptID Feature Contig Start Stop Strand Name Product Alias/Synonyms EC_number BUSCO PFAM InterPro EggNog COG GO Terms Secreted Membrane Protease CAZyme Notes gDNA mRNA CDS-transcript Translation
FUN_000001 FUN_000001-T1 mRNA chr_1 1300 2219 - hypothetical protein
ATGGCGGCCGCCGCGCCTCACTGCCCGGAAGTGACGGTTCTCCGCCGGAAGCCGTTTAGGCGACCCGCGGTGGCGTCAATGGAGTGCGCCGGGAGGAACGCGGCCCGGCCTGGTAGCGGGGATTGGAAGGCTGCTGCGCCTGCGGTCTGCGCGGCGCTCGTGAGACCCGCCCTCGCAGCCTCTCCCGTGTTTCCCCGCGTTTCCTTCGGCTTGTTGGGGGGGGGGGGGGGCGGGGGTGCGTCGTCCGACAGCTCCCGGAGGACCGGGCCTTCCCGGACCGCTATAGACGATCCACGCGTCGGGTTGGCGCTTGGTGGCCGCCGCTAAGTCTGTCTGATCGCGGCTCTCCCGTGTCGCGAATCCCCTCGGAGACTGCGCCCCGTGGTCCGTCTGGGGCAGCGCAGCCCCCCACCTTCCCCCCCCAAATCCTGAACGAGGGGCGACGAATCCCCGGGTTGCCAGCAGGCCCCAGAACCCCGAAGCTGGTCCTGCCAGCTAGGTTCCTCATCCCGGCCTCCGGGCAGCCTCCTCCCGCGGAAGACAGGAGGGGGGTGGAGGGCGGCGAGCACCCCCGAGATCAGCCTCACGTGGAGAGGCCGGAGGAGGAGAGCAGAAGGCAGTGCGGCCCGCTCCGGGGCCTCCGCCCAAGGCTGTGCCCAGCTGGCGGGTATGGTGGTGCTGAAGAGCGAGGGGGTCGAGGAGGAGGGCGCTGGCCACCATCCCACCAGCAGGCAGGCACAAGCAGCGCTGTTTCAGTTGCAGGCCTGAAGCTGCCCGAAAGGGGTCCGGATAATCTACTTCATCCGGGATCCTTTCAAGCTGCTGTGCAACCACTTTCTCAACATCCTTCTCTAAAAAGAGAAGGCCGGGTACAATACGAAAATCATCTAGTATGGTTGGGTCTACAGATAGTTCCTTGA TCAAGGAACTATCTGTAGACCCAACCATACTAGATGATTTTCGTATTGTACCCGGCCTTCTCTTTTTAGAGAAGGATGTTGAGAAAGTGGTTGCACAGCAGCTTGAAAGGATCCCGGATGAAGTAGATTATCCGGACCCCTTTCGGGCAGCTTCAGGCCTGCAACTGAAACAGCGCTGCTTGTGCCTGCCTGCTGGTGGGATGGTGGCCAGCGCCCTCCTCCTCGACCCCCTCGCTCTTCAGCACCACCATACCCGCCAGCTGGGCACAGCCTTGGGCGGAGGCCCCGGAGCGGGCCGCACTGCCTTCTGCTCTCCTCCTCCGGCCTCTCCACGTGAGGCTGATCTCGGGGGTGCTCGCCGCCCTCCACCCCCCTCCTGTCTTCCGCGGGAGGAGGCTGCCCGGAGGCCGGGATGAGGAACCTAGCTGGCAGGACCAGCTTCGGGGTTCTGGGGCCTGCTGGCAACCCGGGGATTCGTCGCCCCTCGTTCAGGATTTGGGGGGGGAAGGTGGGGGGCTGCGCTGCCCCAGACGGACCACGGGGCGCAGTGAGCGCCGCGCAGACCGCAGGCGCAGCAGCCTTCCAATCCCCGCTACCAGGCCGGGCCGCGTTCCTCCCGGCGCACTCCATTGACGCCACCGCGGGTCGCCTAAACGGCTTCCGGCGGAGAACCGTCACTTCCGGGCAGTGAGGCGCGGCGGCCGCCAT TCAAGGAACTATCTGTAGACCCAACCATACTAGATGATTTTCGTATTGTACCCGGCCTTCTCTTTTTAGAGAAGGATGTTGAGAAAGTGGTTGCACAGCAGCTTGAAAGGATCCCGGATGAAGTAGATTATCCGGACCCCTTTCGGGCAGCTTCAGGCCTGCAACTGAAACAGCGCTGCTTGTGCCTGCCTGCTGGTGGGATGGTGGCCAGCGCCCTCCTCCTCGACCCCCTCGCTCTTCAGCACCACCATACCCGCCAGCTGGGCACAGCCTTGGGCGGAGGCCCCGGAGCGGGCCGCACTGCCTTCTGCTCTCCTCCTCCGGCCTCTCCACGTGAGGCTGATCTCGGGGGTGCTCGCCGCCCTCCACCCCCCTCCTGTCTTCCGCGGGAGGAGGCTGCCCGGAGGCCGGGATGAGGAACCTAGCTGGCAGGACCAGCTTCGGGGTTCTGGGGCCTGCTGGCAACCCGGGGATTCGTCGCCCCTCGTTCAGGATTTGGGGGGGGAAGGTGGGGGGCTGCGCTGCCCCAGACGGACCACGGGGCGCAGTGAGCGCCGCGCAGACCGCAGGCGCAGCAGCCTTCCAATCCCCGCTACCAGGCCGGGCCGCGTTCCTCCCGGCGCACTCCATTGACGCCACCGCGGGTCGCCTAAACGGCTTCCGGCGGAGAACCGTCACTTCCGGGCAGTGAGGCGCGGCGGCCGCCAT MAAAAPHCPEVTVLRRKPFRRPAVASMECAGRNAARPGSGDWKAAAPAVCAALTAPRGPSGAAQPPTFPPQILNEGRRIPGLPAGPRTPKLVLPARFLIPASGQPPPAEDRRGVEGGEHPRDQPHVERPEEESRRQCGPLRGLRPRLCPAGGYGGAEERGGRGGGRWPPSHQQAGTSSAVSVAGLKLPERGPDNLLHPGSFQAAVQPLSQHPSLKREGRVQYENHLVWLGLQIVP