openphacts / GLOBAL

Global project issues [private for now. owner lee harland]
3 stars 0 forks source link

Chembl target mappings (single-proteins / protein complexes / protein families) #333

Open danidi opened 8 years ago

danidi commented 8 years ago

In 1.5 we had all mappings of Chembl targets to Chembl target components (and then to uniprot) in the default lens, regardless if it is a single protein or a complex or a family. With the IMS reload, we now have the mappings in three different linksets (see below). Currently, we only have the single protein linkset available in the default lens, which has the consequence that we show less data than ChEMBL if you start for example with a uniprot protein that is part of a complex. Should we add the complex and target family mappings to default as well? This would give a similar behaviour as on 1.5, with some implications. For example:

1st: Chembl Target to Chembl Target Component (single) Predicate: exactMatch Justification: http://semanticscience.org/resource/SIO_010043 (protein) Linkset: http://data.openphacts.org/dev/ims/linksets/data/ops-chembl-linksets/chembl_20.1_singletarget_targetcmpt_ls.ttl.gz

2nd: Chembl Target to Chembl Target Component (complex) Predicate: relatedMatch Justification: has_part Linkset: http://data.openphacts.org/dev/ims/linksets/data/ops-chembl-linksets/chembl_20.1_complextarget_targetcmpt_ls.ttl.gz

3rd: Chembl Target to Chembl Target Component (group) Predicate: relatedMatch Justification: SIO_000059 (has member) Linkset: http://data.openphacts.org/dev/ims/linksets/data/ops-chembl-linksets/chembl_20.1_grouptarget_targetcmpt_ls.ttl.gz

stain commented 8 years ago

To me it sounds like this can be solved by having a lens that turns on the complexes, which could be turned on in the API with a ?include_complexes=true or something.. but is it possible in the API to add some kind of conditional lens like that? Would documenting the lens parameter be easier? @antonisloizou - views?

AlasdairGray commented 8 years ago

On 26 Nov 2015, at 17:39, danidi notifications@github.com<mailto:notifications@github.com> wrote:

In 1.5 we had all mappings of Chembl targets to Chembl target components (and then to uniprot) in the default lens, regardless if it is a single protein or a complex or a family. With the IMS reload, we now have the mappings in three different linksets (see below). Currently, we only have the single protein linkset available in the default lens, which has the consequence that we show less data than ChEMBL if you start for example with a uniprot protein that is part of a complex. Should we add the complex and target family mappings to default as well?

I think that we should as I believe that is the behaviour that the user will expect.

This would give a similar behaviour as on 1.5, with some implications. For example:

The implication of the above is that for 2.0 we should really be clear in the result set that there are multiple proteins (perhaps this can already be done for 1.5 but I think it is a more major API change). Thus, for 1.5 I would leave the mangled result (which is presumably the behaviour we would have had from 1.4).

Unless the implementation has changed, the predicate is not the important bit here but the justification. It should be permissible to go from gene -> protein -> protein complex based on the justifications.

Alasdair

*

1st: Chembl Target to Chembl Target Component (single) Predicate: exactMatch Justification: http://semanticscience.org/resource/SIO_010043 (protein) Linkset: http://data.openphacts.org/dev/ims/linksets/data/ops-chembl-linksets/chembl_20.1_singletarget_targetcmpt_ls.ttl.gz

2nd: Chembl Target to Chembl Target Component (complex) Predicate: relatedMatch Justification: has_part Linkset: http://data.openphacts.org/dev/ims/linksets/data/ops-chembl-linksets/chembl_20.1_complextarget_targetcmpt_ls.ttl.gz

3rd: Chembl Target to Chembl Target Component (group) Predicate: relatedMatch Justification: SIO_000059 (has member) Linkset: http://data.openphacts.org/dev/ims/linksets/data/ops-chembl-linksets/chembl_20.1_grouptarget_targetcmpt_ls.ttl.gz

— Reply to this email directly or view it on GitHubhttps://github.com/openphacts/GLOBAL/issues/333.

Alasdair J G Gray Fellow of the Higher Education Academy Assistant Professor in Computer Science, School of Mathematical and Computer Sciences (Athena SWAN Bronze Award) Heriot-Watt University, Edinburgh UK.

Email: A.J.G.Gray@hw.ac.ukmailto:A.J.G.Gray@hw.ac.uk Web: http://www.alasdairjggray.co.uk ORCID: http://orcid.org/0000-0002-5711-4872 Office: Earl Mountbatten Building 1.39 Twitter: @gray_alasdair


We invite research leaders and ambitious early career researchers to join us in leading and driving research in key inter-disciplinary themes. Please see www.hw.ac.uk/researchleaders for further information and how to apply.

Heriot-Watt University is a Scottish charity registered under charity number SC000278.

AlasdairGray commented 8 years ago

On 2 Dec 2015, at 10:52, Stian Soiland-Reyes notifications@github.com<mailto:notifications@github.com> wrote:

To me it sounds like this can be solved by having a lens that turns on the complexes, which could be turned on in the API with a ?include_complexes=true or something.. but is it possible in the API to add some kind of conditional lens like that? Would documenting the lens parameter be easier? @antonisloizouhttps://github.com/antonisloizou - views?

I think it would be appropriate to have lenses that change the default behaviour to make it more stringent in this case, i.e. only permitting cases where there is a single protein

Alasdair

— Reply to this email directly or view it on GitHubhttps://github.com/openphacts/GLOBAL/issues/333#issuecomment-161257059.

Alasdair J G Gray Fellow of the Higher Education Academy Assistant Professor in Computer Science, School of Mathematical and Computer Sciences (Athena SWAN Bronze Award) Heriot-Watt University, Edinburgh UK.

Email: A.J.G.Gray@hw.ac.ukmailto:A.J.G.Gray@hw.ac.uk Web: http://www.alasdairjggray.co.uk ORCID: http://orcid.org/0000-0002-5711-4872 Office: Earl Mountbatten Building 1.39 Twitter: @gray_alasdair


We invite research leaders and ambitious early career researchers to join us in leading and driving research in key inter-disciplinary themes. Please see www.hw.ac.uk/researchleaders for further information and how to apply.

Heriot-Watt University is a Scottish charity registered under charity number SC000278.

nicklynch commented 8 years ago

Just trying to summarise where we are with this and agree some next steps.

Add to the default lens the options to return complex and target family mappings using the extra linksets - making this more like 1.5 & ChEMBL Check the justifications are appropriate (has_part & has_member) for the Complex and Family Add to response format description that multiple proteins could be returned Changes to Explorer if needed? Support a more precise query through using lens to restrict to a single protein Document in the Release notes with some examples of expected behaviour *Do not change 1.5 API at this time

Since this is a 2.0 release there is room to make a reasonable change in light of the changed linksets and improved granularity. it will be easier to make now if we can

@stain @antonisloizou @agaulton Would this work?

danidi commented 8 years ago

The more precise query is already possible by setting the target type to single protein in the query parameters.

antonisloizou commented 8 years ago

So if I understood correctly the only thing that needs to be done on the API side is to change the documentation of target information to say that more than 1 Uniprot entities may be returned.

Is that correct ?

For e.g. , lets take chembl_target:CHEMBL2095232 (a complex with 3 components - CHEMBL_TC_5211, CHEMBL_TC_5213 and CHEMBL_TC_5214)

Target Info (NOW) returns only the ChEMBL info for each component: http://ops2.few.vu.nl/target?uri=http%3A%2F%2Frdf.ebi.ac.uk%2Fresource%2Fchembl%2Ftarget%2FCHEMBL2095232

(An annoying side issue is that after the Virtuoso update, the SPARQL compiler now throws an error when a Literal or Blank Node is inserted in the subject position via a VALUES clause, resulting in a HTTP 500. This is now fixed by inserting the ops:no_mappings_found URI instead.)

Now, on the IMS side (as the All lens allows all 3 justifications (SIO_010043, has_part, and SIO_000059) and ChEMBL target component is an allowed middle source) I would expect that chembl_target:CHEMBL2095232 + lensUri=All would give the 3 corresponding Uniprot URIs (and return information for each one):

But it doesn't seem to : http://ops2.few.vu.nl/QueryExpander/mapUri?Uri=http%3A%2F%2Frdf.ebi.ac.uk%2Fresource%2Fchembl%2Ftarget%2FCHEMBL2095232&lensUri=http%3A%2F%2Fopenphacts.org%2Fspecs%2F%2FLens%2FAll&Pattern+Filter=&overridePredicateURI=&format=text%2Fhtml

@stain, @AlasdairGray, @Christian-B any idea why?

stain commented 8 years ago

I believe the transitives are not followed because the justification on http://ops2.few.vu.nl/QueryExpander/mappingSet/45 is has_part, which is not mentioned in any lens yet. In the merged linkset in 1.5 we had justification SIO_010043.

(both of these use skos:relatedMatch as predicate)

Perhaps if I add a has_part lens and a SIO_000059 lens?

AlasdairGray commented 8 years ago

I don't believe that we should be having a lens with a single justification in it, except for debugging purposes.

Perhaps you mean to add these justifications to the list of allowed justifications in another lens?

AlasdairGray commented 8 years ago

@antonisloizou is the API query written in a way to expect multiple proteins returned as part of the target information call? If so, then yes the documentation needs updating.

antonisloizou commented 8 years ago

@stain Default+has_part+SIO_000059 , Default+has_part, and Default+SIO_000059 sound good to me...

@AlasdairGray The query was written assuming 1 Uniprot, but deals with multiple ones fine (its just more entries in a VALUES clause). It's that assumption that carried over to the docs. It's now changed.

AlasdairGray commented 8 years ago

These lenses will need proper titles and descriptions to make them of use to the scientists.

Christian-B commented 8 years ago

https://github.com/bridgedb/BridgeDb/blob/master/org.bridgedb.uri.sql/src/org/bridgedb/sql/justification/OpsJustificationMaker.java

Does not know how to handle a transitive justification between http://semanticscience.org/resource/SIO_010043 and http://www.obofoundry.org/ro/ro.owl#has_part

So NO transitive is made.

Another reason why type checking using justification was a poor decision looking back on it now!

stain commented 8 years ago

Decision from telcon 2015-12-03:

agaulton commented 8 years ago

Sorry I missed the TC - I think the above makes sense. If the API filter is just on target type, I would expect the lenses to give slightly different results, as the new linksets contain more than one target type e.g., the group/has_member linkset contains both protein families and protein complex groups (poorly defined complexes). So unless you could filter on multiple target types, you would need a lens.

Christian-B commented 8 years ago

All the lens in the world will NOT help if the predicate maker and justifaction maker are unable to combine the different predicates and justifactions in a transitive chain.

stain commented 8 years ago

Stian will arrange a call about this next week to find a resolution.

stain commented 8 years ago

The call will be on Wednesday 2015-01-20 12:00 GMT in a Skype chat.

Contact Skype user soiland to be added.