uniprot / enzymeportal

The EBI Enzyme Portal
http://www.ebi.ac.uk/enzymeportal/
Apache License 2.0
11 stars 4 forks source link

Ortholog grouping #141

Open rafael-alcantara opened 11 years ago

rafael-alcantara commented 11 years ago

This is an old issue (#114) which does require a neat solution. Orthologs are grouped in search results by their UniProt name (ID) prefix, which usually works but fails sometimes, as it is not fool proof.

The UniProt manual is clear about that: it states that the mentioned prefix is an abbreviation of the protein/gene name, which does not necessarily correspond to the recommended protein name or to the gene name and also Whenever possible, we assign the same mnemonic code for orthologous proteins (even if the gene name is not the same).

We must investigate other options to group orthologs, perhaps different resources: PFam, InterPro... ?

rafael-alcantara commented 10 years ago

Another example:

  1. Search for "alanine racemase 1".
  2. Search that name among the results (for me, it is the first one).
  3. Expand the list of species for that summary. You will see two occurrences of Fission yeast with exactly the same scientific name Schizosaccharomyces pombe (strain 972 / ATCC 24843).

It seems the same enzyme, same name, same species. However, the UniProt accession is different. If you go to UniProt and see the history of both entries O59828 and Q9P5N3 you will see that the prefix ALR1 used to group the orthologs of that summary appears in both, though the second one is currently ALR2. That is the reason why the UniProt web service returns both when asked about ALR1

rafael-alcantara commented 10 years ago

Another example, probably from a road show user (help request to mailing list on 2014-02-05, notify back any fix to this :e-mail: ):

I searched for amylase, and found 396 results.
filtered for Bacillus licheniformis and Bacillus subtilis.

filtered to 8 results.
i selected these two enzymes for comparison
Cyclomaltodextrin glucanotransferase [Bacillus licheniformis] 
and
Cyclomaltodextrin glucanotransferase [Bacillus subtilis (strain 168)] 

when i compare this two enzyme, the result displayed different enzymes

Cyclomaltodextrin glucanotransferase
(Bacillus licheniformis) 

Alpha-amylase
(Bacillus subtilis (strain 168)) 

The first summary (B. licheniformis) corresponds basically to the UniProt ID prefix CDGT (cyclomaltodextrine glucanotransferase), while the second one (B. subtilis) corresponds to AMY (alpha amylase). However, searching UniProt for the latter we get some entries which have it in their history, such as P26827 (CDGT_THETU, but once AMY_THETU). It seems as if these intruders "contaminate" the summary, setting the enzyme name (summary title) to cyclomaltodextrine glucanotransferase, when it is actually alpha amylase.