katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
125 stars 65 forks source link

Redundancy (CARB-2 and CARB-8 genes) in ARGannot.fasta #43

Closed wanyuac closed 8 years ago

wanyuac commented 9 years ago

Hi, the sequence of CARB-2 and CARB-8 in this database are the same.

[Evidence] In our database,

137__CARB_BlaCARB-2559 yes;yes;CARB-2;Bla;HQ157204;747-1613;867 ATGAAGTTTTTATTGGCATTTTCGCTTTTAATACCATCCGTGGTTTTTGCAAGTAGTTCA AAGTTTCAGCAAGTTGAACAAGACGTTAAGGCAATTGAAGTTTCTCTTTCTGCTCGTATA GGTGTTTCCGTTCTTGATACTCAAAATGGAGAATATTGGGATTACAATGGCAATCAGCGC TTCCCGTTAACAAGTACTTTTAAAACAATAGCTTGCGCTAAATTACTATATGATGCTGAG CAAGGAAAAGTTAATCCCAATAGTACAGTCGAGATTAAGAAAGCAGATCTTGTGACCTAT TCCCCTGTAATAGAAAAGCAAGTAGGGCAGGCAATCACACTCGATGATGCGTGCTTCGCA ACTATGACTACAAGTGATAATACTGCGGCAAATATCATCCTAAGTGCTGTAGGTGGCCCC AAAGGCGTTACTGATTTTTTAAGACAAATTGGGGACAAAGAGACTCGTCTAGACCGTATT GAGCCTGATTTAAATGAAGGTAAGCTCGGTGATTTGAGGGATACGACAACTCCTAAGGCA ATAGCCAGTACTTTGAATAAATTTTTATTTGGTTCCGCGCTATCTGAAATGAACCAGAAA AAATTAGAGTCTTGGATGGTGAACAATCAAGTCACTGGTAATTTACTACGTTCAGTATTG CCGGCGGGATGGAACATTGCGGATCGCTCAGGTGCTGGCGGATTTGGTGCTCGGAGTATT ACAGCAGTTGTGTGGAGTGAGCATCAAGCCCCAATTATTGTGAGCATCTATCTAGCTCAA ACACAGGCTTCAATGGCAGAGCGAAATGATGCGATTGTTAAAATTGGTCATTCAATTTTT GACGTTTATACATCACAGTCGCGCTGA

137__CARB_BlaCARB-8564 yes;yes;CARB-8;Bla;GQ866976;1345-2211;867 ATGAAGTTTTTATTGGCATTTTCGCTTTTAATACCATCCGTGGTTTTTGCAAGTAGTTCA AAGTTTCAGCAAGTTGAACAAGACGTTAAGGCAATTGAAGTTTCTCTTTCTGCTCGTATA GGTGTTTCCGTTCTTGATACTCAAAATGGAGAATATTGGGATTACAATGGCAATCAGCGC TTCCCGTTAACAAGTACTTTTAAAACAATAGCTTGCGCTAAATTACTATATGATGCTGAG CAAGGAAAAGTTAATCCCAATAGTACAGTCGAGATTAAGAAAGCAGATCTTGTGACCTAT TCCCCTGTAATAGAAAAGCAAGTAGGGCAGGCAATCACACTCGATGATGCGTGCTTCGCA ACTATGACTACAAGTGATAATACTGCGGCAAATATCATCCTAAGTGCTGTAGGTGGCCCC AAAGGCGTTACTGATTTTTTAAGACAAATTGGGGACAAAGAGACTCGTCTAGACCGTATT GAGCCTGATTTAAATGAAGGTAAGCTCGGTGATTTGAGGGATACGACAACTCCTAAGGCA ATAGCCAGTACTTTGAATAAATTTTTATTTGGTTCCGCGCTATCTGAAATGAACCAGAAA AAATTAGAGTCTTGGATGGTGAACAATCAAGTCACTGGTAATTTACTACGTTCAGTATTG CCGGCGGGATGGAACATTGCGGATCGCTCAGGTGCTGGCGGATTTGGTGCTCGGAGTATT ACAGCAGTTGTGTGGAGTGAGCATCAAGCCCCAATTATTGTGAGCATCTATCTAGCTCAA ACACAGGCTTCAATGGCAGAGCGAAATGATGCGATTGTTAAAATTGGTCATTCAATTTTT GACGTTTATACATCACAGTCGCGCTGA

They are identical as revealed by megaBLAST. This redudancy already exists in the original ARG-ANNOT database as well as in GenBank:

gb|HQ157204.1|:747-1613|CARB-2 ATGAAGTTTTTATTGGCATTTTCGCTTTTAATACCATCCGTGGTTTTTGCAAGTAGTTCAAAGTTTCAGCAAGTTGAACAAGACGTTAAGGCAATTGAAGTTTCTCTTTCTGCTCGTATAGGTGTTTCCGTTCTTGATACTCAAAATGGAGAATATTGGGATTACAATGGCAATCAGCGCTTCCCGTTAACAAGTACTTTTAAAACAATAGCTTGCGCTAAATTACTATATGATGCTGAGCAAGGAAAAGTTAATCCCAATAGTACAGTCGAGATTAAGAAAGCAGATCTTGTGACCTATTCCCCTGTAATAGAAAAGCAAGTAGGGCAGGCAATCACACTCGATGATGCGTGCTTCGCAACTATGACTACAAGTGATAATACTGCGGCAAATATCATCCTAAGTGCTGTAGGTGGCCCCAAAGGCGTTACTGATTTTTTAAGACAAATTGGGGACAAAGAGACTCGTCTAGACCGTATTGAGCCTGATTTAAATGAAGGTAAGCTCGGTGATTTGAGGGATACGACAACTCCTAAGGCAATAGCCAGTACTTTGAATAAATTTTTATTTGGTTCCGCGCTATCTGAAATGAACCAGAAAAAATTAGAGTCTTGGATGGTGAACAATCAAGTCACTGGTAATTTACTACGTTCAGTATTGCCGGCGGGATGGAACATTGCGGATCGCTCAGGTGCTGGCGGATTTGGTGCTCGGAGTATTACAGCAGTTGTGTGGAGTGAGCATCAAGCCCCAATTATTGTGAGCATCTATCTAGCTCAAACACAGGCTTCAATGGCAGAGCGAAATGATGCGATTGTTAAAATTGGTCATTCAATTTTTGACGTTTATACATCACAGTCGCGCTGA

gb|GQ866976.1|:1345-2211|CARB-8 ATGAAGTTTTTATTGGCATTTTCGCTTTTAATACCATCCGTGGTTTTTGCAAGTAGTTCAAAGTTTCAGCAAGTTGAACAAGACGTTAAGGCAATTGAAGTTTCTCTTTCTGCTCGTATAGGTGTTTCCGTTCTTGATACTCAAAATGGAGAATATTGGGATTACAATGGCAATCAGCGCTTCCCGTTAACAAGTACTTTTAAAACAATAGCTTGCGCTAAATTACTATATGATGCTGAGCAAGGAAAAGTTAATCCCAATAGTACAGTCGAGATTAAGAAAGCAGATCTTGTGACCTATTCCCCTGTAATAGAAAAGCAAGTAGGGCAGGCAATCACACTCGATGATGCGTGCTTCGCAACTATGACTACAAGTGATAATACTGCGGCAAATATCATCCTAAGTGCTGTAGGTGGCCCCAAAGGCGTTACTGATTTTTTAAGACAAATTGGGGACAAAGAGACTCGTCTAGACCGTATTGAGCCTGATTTAAATGAAGGTAAGCTCGGTGATTTGAGGGATACGACAACTCCTAAGGCAATAGCCAGTACTTTGAATAAATTTTTATTTGGTTCCGCGCTATCTGAAATGAACCAGAAAAAATTAGAGTCTTGGATGGTGAACAATCAAGTCACTGGTAATTTACTACGTTCAGTATTGCCGGCGGGATGGAACATTGCGGATCGCTCAGGTGCTGGCGGATTTGGTGCTCGGAGTATTACAGCAGTTGTGTGGAGTGAGCATCAAGCCCCAATTATTGTGAGCATCTATCTAGCTCAAACACAGGCTTCAATGGCAGAGCGAAATGATGCGATTGTTAAAATTGGTCATTCAATTTTTGACGTTTATACATCACAGTCGCGCTGA

All of these sequences listed above are the same to each other. Therefore, we can consider to remove CARB-8 from the database ARGannot.fasta.

katholt commented 8 years ago

Fixed in ARGannot.r1.fasta, thanks Wan