Open rafael-alcantara opened 11 years ago
Another example:
It seems the same enzyme, same name, same species. However, the UniProt accession is different. If you go to UniProt and see the history of both entries O59828 and Q9P5N3 you will see that the prefix ALR1 used to group the orthologs of that summary appears in both, though the second one is currently ALR2. That is the reason why the UniProt web service returns both when asked about ALR1
Another example, probably from a road show user (help request to mailing list on 2014-02-05, notify back any fix to this :e-mail: ):
I searched for amylase, and found 396 results.
filtered for Bacillus licheniformis and Bacillus subtilis.
filtered to 8 results.
i selected these two enzymes for comparison
Cyclomaltodextrin glucanotransferase [Bacillus licheniformis]
and
Cyclomaltodextrin glucanotransferase [Bacillus subtilis (strain 168)]
when i compare this two enzyme, the result displayed different enzymes
Cyclomaltodextrin glucanotransferase
(Bacillus licheniformis)
Alpha-amylase
(Bacillus subtilis (strain 168))
The first summary (B. licheniformis) corresponds basically to the UniProt ID prefix CDGT (cyclomaltodextrine glucanotransferase), while the second one (B. subtilis) corresponds to AMY (alpha amylase). However, searching UniProt for the latter we get some entries which have it in their history, such as P26827 (CDGT_THETU, but once AMY_THETU). It seems as if these intruders "contaminate" the summary, setting the enzyme name (summary title) to cyclomaltodextrine glucanotransferase, when it is actually alpha amylase.
This is an old issue (#114) which does require a neat solution. Orthologs are grouped in search results by their UniProt name (ID) prefix, which usually works but fails sometimes, as it is not fool proof.
The UniProt manual is clear about that: it states that the mentioned prefix is an abbreviation of the protein/gene name, which does not necessarily correspond to the recommended protein name or to the gene name and also Whenever possible, we assign the same mnemonic code for orthologous proteins (even if the gene name is not the same).
We must investigate other options to group orthologs, perhaps different resources: PFam, InterPro... ?