pircurators / tracking

For tracking curation issues and requests
0 stars 0 forks source link

HBB2_DANRE #19

Open nataled opened 6 years ago

nataled commented 6 years ago

Warning: this one could be complicated.

I was looking at why Q90485 (HBB2_DANRE; Hemoglobin subunit beta-2) only had one gene cross-reference when the GN line indicated it was encoded by two (ba2 and ba2l). The entry is cross-referenced to GeneID:30217 but not to Zfin or Ensembl. It has a RefSeq (NP_001005403.1) which, interestingly, indicates that it doesn't map to the reference genome. I navigated to GeneID:30217; this indicates that the authoritative source for the information is ZFIN:ZDB-GENE-990415-19. However--and here is where things start to get tricky--that accession actually takes you to ZFIN:ZDB-GENE-990415-18 (19 vs 18), and the latter is a cross-reference within Q90486 (HBB1_DANRE; Hemoglobin subunit beta-1). I compiled relevant information for HBB1_DANRE:

Name:    ba1                         ba1l
GeneID:  30216                       504174
Zfin:    ZDB-GENE-990415-18          ZDB-GENE-040901-4
Ensembl: ENSDARG00000089087          ENSDARG00000097238
RefSeq:  NP_571095.1                 NP_001013045.1
GRCz11:  (Chr3) 55103445-55104310    (Chr3) 55098457-55099272
GRCz10:  (Chr3) 54848843-54849708    (Chr3) 54843855-54844670

GRCz11 and GRCz10 refer to the reference genome assembly used for the genome. The information is important because each resource uses a different reference assembly, so attempting to compare locations gets a bit messy.

From what I can tell, HBB1_DANRE is consistent across resources. The only strange thing is that ZDB-GENE-990415-18 indicates that it maps to yet a different UniProtKB entry, Q1RM32 (a TrEMBL entry). I would thus expect that Q90485 and Q90486 and Q1RM32 would all have the same sequence, but they don't. Q90485 does have the same sequence as yet another TrEMBL entry, B3DG37, and that entry has the same RefSeq and GeneID cross-references as Q90486! Yes, it's confusing. I think what's happening is that all four of the entries are encoded by the same two genes, but the entries come from different genome builds. In fact, if you look at the history for ZDB-GENE-990415-18 you'll see that the 'previous names' include the ones indicated for Q90485, lending support for the idea that this entry is based on an older assembly. I believe Q90485 should be made obsolete. By the way, the current names for the two genes are hbba1 (replacing ba1, ba2) and ba1l (no change).

So where does that leave Hemoglobin subunit beta-2? If you search Zfin for hbba2 you find the following:

Name:    hbba2
GeneID:  445037
Zfin:    ZDB-GENE-040801-164
Ensembl: ENSDARG00000006934
RefSeq:  NP_001003431.1
GRCz11:  (Chr3) 55091230-55092051
GRCz10:  (Chr3) 54836628-54837449

Searching UniProt with each of the above give these:

hbba2:                  Q6ZM12     Q6DGK4
445037:                 Q6ZM12     Q6DGK4
ZDB-GENE-040801-164:               Q6DGK4
ENSDARG00000006934:     Q6ZM12
NP_001003431.1:                    Q6DGK4

A few of these also map to Q4V9B0, but that's clearly a fragment. Alignment between the two full-length entries indicate that they are not identical. Based on the information in GeneID, Q6ZM12 is the latest version, and this is supported by the Ensembl mapping. I suggest that Q6ZM12 be promoted to Swiss-Prot, and get the HBB2_DANRE identifier.