monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

Add Source: UniProt reference proteomes #412

Open cmungall opened 7 years ago

cmungall commented 7 years ago

eukaryotic subset here:

ftp://ftp.ebi.ac.uk/pub/databases/reference_proteomes/QfO/Eukaryota/

primarily this will serve as an identifier mapping resource - see #410. We won't need to map IDs ahead of time.

Note that it won't be legitimate to use equivalentClasses between gene IDs and protein IDs. However, the reference proteome should stand in a 1:1 with genes. The logical relationship should be a RO translated_from or similar relation. We could also place a xref between the two. (see also https://github.com/SciGraph/golr-loader/issues/26)

mellybelly commented 7 years ago

There are cases where it's not 1:1, unless some other ID syntax for dealing w these cases

On Jan 18, 2017, at 7:31 PM, Chris Mungall notifications@github.com<mailto:notifications@github.com> wrote:

eukaryotic subset here:

ftp://ftp.ebi.ac.uk/pub/databases/reference_proteomes/QfO/Eukaryota/

primarily this will serve as an identifier mapping resource - see #410https://github.com/monarch-initiative/dipper/issues/410. We won't need to map IDs ahead of time.

Note that it won't be legitimate to use equivalentClasses between gene IDs and protein IDs. However, the reference proteome should stand in a 1:1 with genes. The logical relationship should be a RO translated_from or similar relation. We could also place a xref between the two. (see also SciGraph/golr-loader#26https://github.com/SciGraph/golr-loader/issues/26)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/monarch-initiative/dipper/issues/412, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAwn3Gm3E6vukBt4z0gNRR5QfovcSQ7rks5rTtkDgaJpZM4Lnqsk.

cmungall commented 7 years ago

The terminology is a bit confusing with uniprot and the notion of reference. But for the qfo reference proteomes they are meant to be truly 1:1. There will still be cases e.g. where ENSEMBL needs to split or merge.

cmungall commented 7 years ago

Also in GO, for anything provided by GOA, the main entities annotated are reference proteome IDs.