openphacts / GLOBAL

Global project issues [private for now. owner lee harland]
3 stars 0 forks source link

ensembl gene IDs don't work in Target Pharm (ensembl transcripts do) #149

Open ChristineChichester opened 10 years ago

ChristineChichester commented 10 years ago

From MapURL

"primaryTopic": { "_about": "http://www.uniprot.org/uniprot/P56817", "exactMatch": [ "http://identifiers.org/ensembl/ENSG00000265969", "http://identifiers.org/ensembl/ENST00000428381", "http://identifiers.org/ensembl/ENST00000313005", "http://identifiers.org/ensembl/ENST00000445823", "http://identifiers.org/ensembl/ENST00000513780", "http://identifiers.org/ensembl/ENSG00000186318" ] http://identifiers.org/ensembl/ENSG00000265969 gives 404 from Target Pharm http://identifiers.org/ensembl/ENST00000428381 works http://identifiers.org/ensembl/ENST00000313005 works http://identifiers.org/ensembl/ENST00000445823 works http://identifiers.org/ensembl/ENST00000513780 works http://identifiers.org/ensembl/ENSG00000186318 gives 404

Ensembl Gene IDs do not seem to work in Target Pharmacology although they appear to be mapped to the protein.

Christian-B commented 10 years ago

Both http://identifiers.org/ensembl/ENSG00000265969 and http://identifiers.org/ensembl/ENSG00000186318 map to a HIGH number of uniport URIs.

See https://github.com/openphacts/GLOBAL/issues/84

ChristineChichester commented 10 years ago

MapURL does work with the genes

danidi commented 10 years ago

http://purl.uniprot.org/uniprot/ works. See also https://github.com/openphacts/GLOBAL/issues/147

danidi commented 10 years ago

It seems that http://identifiers.org/ensembl/ENSG00000186318 is mapped to http://www.uniprot.org/uniprot/P56817 which is mapped to Chembl. But with the ensembl URI you don't get a Chembl URI. Should we see this transitive mapping with mapURI? Or are the mappings not calculated when there are too many links to uniprot?

Christian-B commented 10 years ago

The mapping between http://identifiers.org/ensembl/ENSG00000186318 and http://www.uniprot.org/uniprot/P56817 comes from the Human base ensembl mapping set with the 1 to VERY man issue.

I an (failed) attempt to solve the one to many issues I appear to have given that a different justification which prevented the transitives.

http://identifiers.org/ensembl/ENSG00000186318 is not in the small replacement mapping set we have so at best will soon only be available in the all lens.

danidi commented 9 years ago

http://identifiers.org/ensembl/ENSG00000186318 is still missing on develop with the new IMS.

antonisloizou commented 8 years ago

http://ops2.few.vu.nl/QueryExpander/mapUri?Uri=http%3A%2F%2Fidentifiers.org%2Fensembl%2FENSG00000186318&lensUri=http%3A%2F%2Fopenphacts.org%2Fspecs%2F%2FLens%2FDefault&targetUriPattern=http%3A%2F%2Fwww.uniprot.org&overridePredicateURI=&format=text%2Fhtml

Seems to be there now, but its 1 to many ...

danidi commented 8 years ago

The mapping is there, but target pharmacology does not return any data. The corresponding protein (http://www.uniprot.org/uniprot/P56817) does.

stain commented 8 years ago

It seems the mapping from ensembl is missing chembl, which shows up if you ask for uniprot.

AlasdairGray commented 8 years ago

@danidi I see from @stain's example above that the gene in question maps to multiple proteins. I suspect that @Christian-B put some logic into the IMS to ignore either linksets with multiple targets or did not name trembl as a suitable intermediary for transitives.

Christian-B commented 8 years ago

As I left the project: There is no logic to ignore links based on the number of targets/mappings.

The intermediary for transitive are lens dependent. According to the latest version listed at http://openphacts.cs.man.ac.uk tembl is an intermediary. See: http://openphacts.cs.man.ac.uk:9095/QueryExpander/Lens

Replace http://openphacts.cs.man.ac.uk:9095 with the URL of the IMS you are using.

danidi commented 8 years ago

According to http://ops2.few.vu.nl/QueryExpander/Lens, Ensembl and Uniprot are both allowed middle sources in the default lens.

danidi commented 8 years ago

Just to summarize the issue (tested with http://ops2.few.vu.nl/QueryExpander/BridgeDb): http://identifiers.org/ensembl/ENSG00000186318 finds http://www.uniprot.org/uniprot/P56817 (and several other uniprot URIs) in http://ops2.few.vu.nl/QueryExpander/mappingSet/81 (justification SIO_000985 protein coding gene, predicate INFERRED_FROM_TRANSLATION), but no mapping to ChEMBL.

http://www.uniprot.org/uniprot/P56817 finds CHEMBL_TC_3139 in http://ops2.few.vu.nl/QueryExpander/mappingSet/4 (justification SIO_010043 protein, predicate exactMatch), which in turn finds CHEMBL4822 in http://ops2.few.vu.nl/QueryExpander/mappingSet/2 (justification SIO_010043 protein, predicate exactMatch).

Going the other way round (starting with http://linkedchemistry.info/chembl/target/tCHEMBL4822), we can find URIs from Ensembl (e.g. ENST00000313005 via http://ops2.few.vu.nl/QueryExpander/mappingSet/11), but this is the old Ensembl linkset, not the one provided by @JonathanMELIUS.

Is there something else necessary, to allow Jonathan's linksets as middle sources? Is the predicate important for the transitives? Or is it a problem, that now we have to linksets between ensembl and uniprot at the same time?

Christian-B commented 8 years ago

Predicate could be important if they are different as the system would need to work out the new predicate.

As I left it this was done by https://github.com/bridgedb/BridgeDb/blob/master/org.bridgedb.uri.sql/src/org/bridgedb/sql/predicate/LoosePredicateMaker.java

Justification is also important. Again as I left it the combiner was: https://github.com/bridgedb/BridgeDb/blob/master/org.bridgedb.uri.sql/src/org/bridgedb/sql/justification/OpsJustificationMaker.java

Christian-B commented 8 years ago

I also notice that not all the chembl target linksets are in the default lens!

Compare http://ops2.few.vu.nl/QueryExpander/SourceTargetInfos?sourceCode=Chembl16TargetComponent&lensUri=All http://ops2.few.vu.nl/QueryExpander/SourceTargetInfos?sourceCode=Chembl16TargetComponent

Is this intentional? Remember that linkset presence in lens depends on the justifications. See: ops2.few.vu.nl/QueryExpander/Lens

danidi commented 8 years ago

Yes, this is intentional. The ChEMBL linkset was split up to have only the single protein mappings in the default lens. The others are complexes or target groups. We might actually need a dedicated lens for those.

Christian-B commented 8 years ago

If https://github.com/bridgedb/BridgeDb/blob/master/org.bridgedb.uri.sql/src/org/bridgedb/sql/predicate/LoosePredicateMaker.java

Is the current predicate maker.

http://ops2.few.vu.nl/QueryExpander/mappingSet/81 Which has predicate http://rdf.ebi.ac.uk/terms/ensembl/INFERRED_FROM_TRANSLATION

Will not be allowed in any transitive except with other linksets that have the same predicate.

This is because the IMS has not be told what predicate to use when it find one link http://rdf.ebi.ac.uk/terms/ensembl/INFERRED_FROM_TRANSLATION and one for example http://www.w3.org/2004/02/skos/core#exactMatch

The Fix is to expand LoosePredicateMaker.java

  1. Write rules for the new predicates being used since I left the project -This could be as simple converting them to a skos relationship if different predicates are found A function like cleanup would work here. Call just after cleanup
  2. For very strong predicates such as skos:exactMatch, or owl:sameas have a default of using the other predicate if no better pair is found.
  3. Consider what to do if no predicate is known for a predicate pair a. Block transitive - current behaviour b. Use a default such as the very high level skos:mappingRelation