nhoffman / ya16sdb

A curated subset of 16S rRNA sequences from NCBI
2 stars 4 forks source link

Publication status of NCBI records is not current #38

Open marykstewart opened 4 years ago

marykstewart commented 4 years ago

The publication status for records submitted in association with PMID:24509479 (https://www.ncbi.nlm.nih.gov/bioproject/PRJEB2397) and PMID:25388376 (https://www.ncbi.nlm.nih.gov/bioproject/229402) was not update upon acceptance of the papers. These came to my attention through our whole genome ratification of Streptococcus pneumoniae records project. There is an A/C polymorphism at 16S position 203 that is said to distinguish S. pneumo (always C) from other S. mitis group species (always A). It turns out there are some S. pneumo strains with A at that position, but most appear to be refseqs (direct submissions of reasonable quality), so we did not know whether the submitters to NCBI had actually done much in the way of phenotypic characterization to ensure they were sequencing S. pneumos and not S. pseudopneumos or S. mitis until I found these publications. We use publication status frequently to evaluate record trustworthiness.

S. pneumos with an 'A' from PMID:24509479 (GCF_001113365.1 appears likely misclassified from our analysis, but the others are S. pneumo):
GCF_001344435.1,SMRU2068,NZ_CHVE01000029_6665_8216
GCF_001113365.1,SMRU2014,NZ_CKYA01000001_562_2114
GCF_001130445.1,SMRU2069,NZ_CLES01000030_2995_4546
GCF_001147945.1,SMRU2652,NZ_CLLB01000007_318_1869

An S. pneumo with an 'A' from PMID:25388376:
NZ_JFJF01000123_70_1621,NZ_JFJF01000123.1,NZ_JFJF01000123,NZ_JFJF01000123,"Streptococcus pneumoniae strain SC_0381 contig_81, whole genome shotgun sequence",1313,2019-09-30,2019-10-31,1,Streptococcus pneumoniae,WGS;RefSeq,Streptococcus pneumoniae,1552,0,SC_0381,genomic DNA,,conjunctivitis,70,1621,False,True,False,refseq,b27646b8b8d34ff1cb6d87a56ee8f2f6e022dd49

@crosenth indicated that he would contact NCBI and report back.