reimandlab / ActiveDriverDB

ActiveDriverDB
GNU Lesser General Public License v2.1
12 stars 3 forks source link

External references and citations #84

Closed krassowski closed 7 years ago

krassowski commented 7 years ago

We discussed adding more external references earlier. I would propose to use mappings from uniprot ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/README

Those include:

1. UniProtKB-AC
2. UniProtKB-ID
3. GeneID (EntrezGene)
4. RefSeq
5. GI
6. PDB
7. GO
8. UniRef100
9. UniRef90
10. UniRef50
11. UniParc
12. PIR
13. NCBI-taxon
14. MIM
15. UniGene
16. PubMed
17. EMBL
18. EMBL-CDS
19. Ensembl
20. Ensembl_TRS
21. Ensembl_PRO
22. Additional PubMed

The refseq is the native identifier for us. I think that having UniProtKB and Ensembl ids is a must-go. What about the remaining from this list? Do you see some which would be useful to have?

Also, I would like to propose to start making a list of webpages we want/should link (both as external references and as a kind of citations) in this place, if you agree with me. We already agreed on references to interpro (like http://www.ebi.ac.uk/interpro/protein/P04637).

PS. we probably don't need to have PDB ids, because even if we want to link to the database, it's possible using just Uniprot ID like this: http://www.rcsb.org/pdb/protein/P04637.

krassowski commented 7 years ago

During the last online call phosphosite.org was mentioned as a website we need to refer to (both for user's convenience and to give credit for their's data we base upon). Linking to pages about specific proteins might be a challenging task, since (at the first glance) the identifiers they use seems not to be in any of popular formats (it's not refseq, uniprot id or ENSP).

reimand0 commented 7 years ago

According to email from Peter Hornbeck, we can use Uniprot IDs to link to PSP website.

reimand0 commented 7 years ago

For pathway information:

GO terms: http://amigo.geneontology.org/amigo/term/GO:0003283

Reactome terms: http://www.reactome.org/content/detail/R-HSA-1638091

krassowski commented 7 years ago

I added external references for proteins. If there are references for a protein, then appropriate links will be shown.

reimand0 commented 7 years ago

looks nice:

  1. please share a few examples where PSP links fail
  2. clicking on externals should open new window
  3. maybe we can pack references more from left to right
krassowski commented 7 years ago
  1. Those are all I found:
    ZHX1-C8orf76 (NM_001204180) http://www.phosphosite.org/uniprotAccAction?id=Q96EF9
    HLA-DRB5 (NM_002125) http://www.phosphosite.org/uniprotAccAction?id=Q30154
    HLA-A (NM_002116) http://www.phosphosite.org/uniprotAccAction?id=P10314
    GNAS (NM_080425) http://www.phosphosite.org/uniprotAccAction?id=Q5JWF2
    NDUFC2-KCTD14 (NM_001203261) http://www.phosphosite.org/uniprotAccAction?id=E9PQ53

We have 34 903 tested & working mappings to PSP. We have 3385+866+5 isoforms for which there are no links to PSP (3 385 - no Uniprot accession found, 866 - no externals references found at all, 5 - the five shown above which exists but accession is different than on PSP side).

I tried to search by gene name on PSP site and for example for GNAS it seems to have only one isoform of this gene i.e. P63092 which is different isoform to Q5JWF2. On the other hand NM_080425 is listed as refseq sequence for both entries. Both entries are curated and have maximum annotation score.

I added a snippet on wiki with code used to check dead links.

  1. Done
  2. Done
krassowski commented 7 years ago

What PubMed references should we provide? I mean will users be only interested in protein citations, mRNA citations or both?

reimand0 commented 7 years ago

Instead of pubmed, we should link to NCBI Gene like this,

https://www.ncbi.nlm.nih.gov/gene/3845

I think the number is an EntrezGene ID.

Thanks, Jüri

On Wed, Mar 15, 2017 at 9:36 AM krassowski notifications@github.com wrote:

What PubMed references should we provide? I mean will users be only interested in protein citations, mRNA citations or both?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/reimandlab/Visualisation-Framework-for-Genome-Mutations/issues/84#issuecomment-286743064, or mute the thread https://github.com/notifications/unsubscribe-auth/ASYC_duZiyt2yqzmLpZBfgr8As_bLMrSks5rl-l1gaJpZM4KwJaH .

krassowski commented 7 years ago

I have added links to both NCBI Gene (which is using Entrez ID) and RefSeq Gene. Additionaly we have Uniprot isoform indicated now. The new references no longer use Ensembl biomart but mappings provided by NCBI and Uniprot. Adding more references which are supported by Uniprot should be easy now.

Is that all right? Should I add anything more?

krassowski commented 7 years ago

Maybe adding a star indicating if Uniprot entry is "reviewed" would be useful? E.g. right now there are three Uniprot rows in https://activedriverdb.org/network/show/NM_000546. Unfortunately neither mappings which are provided by Uniprot nor those included in RefSeq differentiate between reviewed and not reviewed entries.

krassowski commented 7 years ago

I added the "reviewed" icon next to Uniprot entries (strongly resembling the icon used on uniprot.org) . The reviewed entries are always showed first (on top). When user hovers mouse over the icons a popup shows up saying if an entry is reviewed or not. Please see: https://activedriverdb.org/network/show/NM_000546

I am recalculating dead PSP links for the new data (should be ready in three hours or so).

krassowski commented 7 years ago

The number of Uniprot entries which are not found in PSP rose from approx 4 000 to more than 15 000 (we have much more uniprot entries now). There are 15434 Uniprot entries which are not in PSP but only five of those are "reviewed". There are 406 proteins which have no external references and 139 protein which have no single Uniprot entry. The not-found-but-reviewed entries are:

NDUFC2-KCTD14 NM_001203261 E9PQ53
GNAS NM_001309883 Q5JWF2
GNAS NM_080425 Q5JWF2
HLA-DRB5 NM_002125 Q30154
ZHX1-C8orf76 NM_001204180 Q96EF9

Those are almost the same entries as previously.

I am closing this issue for now, please reopen if needed.