Closed alec-djinn closed 9 years ago
These are general differences between RefSeq and Ensembl. They contain slightly different transcripts and thus the gene boundaries can differ.
CICP10 on Ensembl: Chromosome 2: 242,119,856-242,120,053 reverse strand.
CICP10 on RefSeq: 242119877..242120142, complement
You can read more about the differences in A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification
I noticed that the coordinates of some genes returned by pyensembl differs from the one published in the NCBI website.
Examples (EnsembleRealese 79 vs GCRh38.p2): I used the following code to get the coordinates:
gene BOK-AS1: for pyensembl is on chr2 at 241544403-241558977 but on NCBI is 241544384..241559143
gene CICP10 for pyensembl is on chr2 at 242119856-242120053 but on NCBI is at 242119877..242120142
I could continue with many other examples. Most of the time the coordinates match perfectly, but not always. Why is so? Is it a bug or am I missing something?