Closed krassowski closed 7 years ago
During the last online call phosphosite.org was mentioned as a website we need to refer to (both for user's convenience and to give credit for their's data we base upon). Linking to pages about specific proteins might be a challenging task, since (at the first glance) the identifiers they use seems not to be in any of popular formats (it's not refseq, uniprot id or ENSP).
According to email from Peter Hornbeck, we can use Uniprot IDs to link to PSP website.
For pathway information:
GO terms: http://amigo.geneontology.org/amigo/term/GO:0003283
Reactome terms: http://www.reactome.org/content/detail/R-HSA-1638091
I added external references for proteins. If there are references for a protein, then appropriate links will be shown.
looks nice:
ZHX1-C8orf76 (NM_001204180) http://www.phosphosite.org/uniprotAccAction?id=Q96EF9
HLA-DRB5 (NM_002125) http://www.phosphosite.org/uniprotAccAction?id=Q30154
HLA-A (NM_002116) http://www.phosphosite.org/uniprotAccAction?id=P10314
GNAS (NM_080425) http://www.phosphosite.org/uniprotAccAction?id=Q5JWF2
NDUFC2-KCTD14 (NM_001203261) http://www.phosphosite.org/uniprotAccAction?id=E9PQ53
We have 34 903 tested & working mappings to PSP. We have 3385+866+5 isoforms for which there are no links to PSP (3 385 - no Uniprot accession found, 866 - no externals references found at all, 5 - the five shown above which exists but accession is different than on PSP side).
I tried to search by gene name on PSP site and for example for GNAS it seems to have only one isoform of this gene i.e. P63092 which is different isoform to Q5JWF2. On the other hand NM_080425 is listed as refseq sequence for both entries. Both entries are curated and have maximum annotation score.
I added a snippet on wiki with code used to check dead links.
What PubMed references should we provide? I mean will users be only interested in protein citations, mRNA citations or both?
Instead of pubmed, we should link to NCBI Gene like this,
https://www.ncbi.nlm.nih.gov/gene/3845
I think the number is an EntrezGene ID.
Thanks, Jüri
On Wed, Mar 15, 2017 at 9:36 AM krassowski notifications@github.com wrote:
What PubMed references should we provide? I mean will users be only interested in protein citations, mRNA citations or both?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/reimandlab/Visualisation-Framework-for-Genome-Mutations/issues/84#issuecomment-286743064, or mute the thread https://github.com/notifications/unsubscribe-auth/ASYC_duZiyt2yqzmLpZBfgr8As_bLMrSks5rl-l1gaJpZM4KwJaH .
I have added links to both NCBI Gene (which is using Entrez ID) and RefSeq Gene. Additionaly we have Uniprot isoform indicated now. The new references no longer use Ensembl biomart but mappings provided by NCBI and Uniprot. Adding more references which are supported by Uniprot should be easy now.
Is that all right? Should I add anything more?
Maybe adding a star indicating if Uniprot entry is "reviewed" would be useful? E.g. right now there are three Uniprot rows in https://activedriverdb.org/network/show/NM_000546. Unfortunately neither mappings which are provided by Uniprot nor those included in RefSeq differentiate between reviewed and not reviewed entries.
I added the "reviewed" icon next to Uniprot entries (strongly resembling the icon used on uniprot.org) . The reviewed entries are always showed first (on top). When user hovers mouse over the icons a popup shows up saying if an entry is reviewed or not. Please see: https://activedriverdb.org/network/show/NM_000546
I am recalculating dead PSP links for the new data (should be ready in three hours or so).
The number of Uniprot entries which are not found in PSP rose from approx 4 000 to more than 15 000 (we have much more uniprot entries now). There are 15434 Uniprot entries which are not in PSP but only five of those are "reviewed". There are 406 proteins which have no external references and 139 protein which have no single Uniprot entry. The not-found-but-reviewed entries are:
NDUFC2-KCTD14 NM_001203261 E9PQ53
GNAS NM_001309883 Q5JWF2
GNAS NM_080425 Q5JWF2
HLA-DRB5 NM_002125 Q30154
ZHX1-C8orf76 NM_001204180 Q96EF9
Those are almost the same entries as previously.
I am closing this issue for now, please reopen if needed.
We discussed adding more external references earlier. I would propose to use mappings from uniprot ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/README
Those include:
The refseq is the native identifier for us. I think that having UniProtKB and Ensembl ids is a must-go. What about the remaining from this list? Do you see some which would be useful to have?
Also, I would like to propose to start making a list of webpages we want/should link (both as external references and as a kind of citations) in this place, if you agree with me. We already agreed on references to interpro (like http://www.ebi.ac.uk/interpro/protein/P04637).
PS. we probably don't need to have PDB ids, because even if we want to link to the database, it's possible using just Uniprot ID like this: http://www.rcsb.org/pdb/protein/P04637.