Closed jbloom closed 9 years ago
@hlyen This does not appear worrisome. I have added a new section "How closely do we expect the reference sequences to match other sequences?" to the notebook at https://github.com/jbloom/InfluenzaTransmissionHuiLingCollaboration/blob/master/SequenceAnalysisAndSynonymousBarcoding.ipynb
The results show that 50% of PB1 cDNAs, 49% of PA cDNAs, and 19% of NS1 cDNAs don't have exact nucleotide matches. Therefore, the genetic diversity of these sequences is fairly high, and it's not surprising that you cloned ones without exact matches. Given that your viruses grew fine in prior work and the protein sequences DO have exact matches, I don't think you need to worry about this. In fact, it would be more unusual if all of your sequences did have exact nucleotide matches in Genbank.
Understand now. Thanks so much, Jesse.
@hlyen Asks if it is worrisome that some of the plasmid sequences (PB1, PA, and NS1) don't have other exact nucleotide matches in the Influenza Virus Resource.