I have two questions. Not sure if this is a generic NCBI issue, or related to the datasets API. Happy to forward the query elsewhere.
I came across this problem recently for the genome of Hevea brasiliensis - taxid 3981 - reference genome assembly GCF_030052815.1.
I thought that having locus_tags was a requirement for genomes to be deposited / queried in the NCBI. However, it seems like the genes in the nuclear genome of this assembly do not have locus_tags:
Here the sequence of both is identical but only one has a locus_tag. There are also cases where there are features that exist in one but not the other.
Question 2: is it common that the annotations in GenBank and RefSeq records differ?
Hello,
I have two questions. Not sure if this is a generic NCBI issue, or related to the datasets API. Happy to forward the query elsewhere.
I came across this problem recently for the genome of Hevea brasiliensis - taxid 3981 - reference genome assembly GCF_030052815.1.
I thought that having
locus_tag
s was a requirement for genomes to be deposited / queried in the NCBI. However, it seems like the genes in the nuclear genome of this assembly do not have locus_tags:https://ncbi.nlm.nih.gov/datasets/gene/GCF_030052815.1/?search=rubber
Question 1: is it to be expected that
locus_tags
are missing, or is it an issue with this assembly in particular?I went to the refseq (https://www.ncbi.nlm.nih.gov/nuccore/NC_079493.1/) and GenBank (https://www.ncbi.nlm.nih.gov/nuccore/CM057502.1?report=genbank&log$=seqview) records. Below is an example of the same CDS in both records:
NC_079493
CM057502
Here the sequence of both is identical but only one has a locus_tag. There are also cases where there are features that exist in one but not the other.
Question 2: is it common that the annotations in GenBank and RefSeq records differ?
Thank you so much for your help!
Best, Manu