openphacts / IdentityMappingService

The Identity Mapping Service to combine BridgeDB and the Validator
1 stars 3 forks source link

Load additional ConceptWiki linksets #22

Open stain opened 8 years ago

stain commented 8 years ago

.. but which ones? Stian to provide list for further discussion.

stain commented 8 years ago

In 1.5 these are loaded: http://heater.cs.man.ac.uk:3004/QueryExpander/SourceTargetInfos?sourceCode=ConceptWiki

from http://data.openphacts.org/1.5/ims/linksets/ConceptWiki/

There's also an issue with conflicting void files, e.g. http://data.openphacts.org/1.5/ims/linksets/ConceptWiki/hack.ttl

ianwdunlop commented 8 years ago

Any chance we can avoid 'hacks'? If they are valid then they should become part of the actual void file

stain commented 8 years ago

ConceptWiki linksets that were loaded in 1.5:

So which one should also go in?

danidi commented 8 years ago

We'll definitely need a linkset from the ConceptWiki Gene IDs to something else, otherwise the text search with gene identifiers will not work. Currently, we only have a linkset available mapping Concept Wiki genes to Uniprot proteins, which actually prevents us to get any gene data back (this would need an additional mapping from uniprot proteins to a gene identifier, which @AlasdairGray mentioned the IMS would not do in one step).

Enzyme and GO would be useful to allow the text search for the classes, but the user might find concepts which are not exactly the same here (see https://github.com/openphacts/IdentityMappingService/issues/18).

AlasdairGray commented 8 years ago

Is there another set of gene identifiers in the system that ConceptWiki could map to with the justification that they are the same gene? ConceptWiki Gene to UniProt protein is a cross-boundary mapping which means that it can only be traversed in one direction and only once.

danidi commented 8 years ago

Could unigene be an option here? I wouldn't go for Ensembl, as we just have the human ones from Jonathan. The question is if there are all genes here, that are also in the CW to uniprot linkset are available there, and if it would cause some one to many issues.

AlasdairGray commented 8 years ago
  1. Do we already have Unigene in the system and if so does it have a link to UniProt proteins?
  2. We would only have a one-to-many issue if there is a one-to-many relationship between ConceptWiki and Unigene, do we know whether such a linkset exists?
  3. If we have unigene, can we use that for gene labels and do away with ConceptWiki for genes?
danidi commented 8 years ago

We have a linkset http://ops2.few.vu.nl/QueryExpander/mappingSet/29, not sure if there is any additional data (and also labels). But looking at the gene linksets we have so far, it seems that many of them are connected to uniprot only. So we will only have gene to gene mappings for the ones @JonathanMELIUS included in the human and mouse ensembl linksets. So I'm not sure anymore that having a direct CW to unigene linkset would give us an advantage.

stain commented 8 years ago

Add the CW linksets that were part of the 1.5 default lens - to be able to do labels and text->concept before IRS2.