openphacts / FederatedPhacts

Experiments in federated querying
3 stars 1 forks source link

Let me compare MW, logP and PSA for known oxidoreductase inhibitors #3

Open egonw opened 5 years ago

egonw commented 5 years ago

Should be solvable with just the EBI platform.

egonw commented 5 years ago

From ChEBI, get all oxidoreductate inhibitors:

SELECT ?subject ?label ?altTerm 
from <http://rdf.ebi.ac.uk/dataset/chebi> 
    WHERE { 
    ?subject rdfs:subClassOf* <http://purl.obolibrary.org/obo/CHEBI_76725> . 
    ?subject rdfs:label ?label. 
}
Chris-Evelo commented 5 years ago

OK, so what do I need in front of that, to be sure I get that ChEBI ontology term ID from the description oxicoreductase inhibitors? Ask the OLS RDF? And if that fails use OLS API? Can we stat at WikiData?

egonw commented 5 years ago

okay, the EBIRDF platform has the ChEBI, but I cannot find the has_role predicate:

SELECT * WHERE {
  {
    <http://purl.obolibrary.org/obo/CHEBI_3962> ?p <http://purl.obolibrary.org/obo/CHEBI_77484>
  }   UNION 
  {
    <http://purl.obolibrary.org/obo/CHEBI_77484> ?p <http://purl.obolibrary.org/obo/CHEBI_3962>
  }
}
Chris-Evelo commented 5 years ago

So moving down you would ask WikiData first about all oxidoreductases (but it doesn't have ChEBI yet) then ask the OLS RDF via EBI SPARQL endpoint there but you run into a missing predicate problem. Then you could use ChEBI RDF itself, but not in EBI RDF, so you need to fire up a SPARQL endpoint.

Chris-Evelo commented 5 years ago

Where the discussion about the "expressed in tissue" question we just had lead to the conclusion that you would still start the question at WikiData, but then simply get 0 hits and combine that with whatever you find further down the line.

egonw commented 5 years ago

Working on those inhibitors in Wikidata:

image

egonw commented 5 years ago

Solution (run):

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX hasValue: <http://semanticscience.org/resource/SIO_000300>
PREFIX hasAttribute: <http://semanticscience.org/resource/SIO_000008>
PREFIX psa: <http://semanticscience.org/resource/CHEMINF_000307>
PREFIX logP: <http://semanticscience.org/resource/CHEMINF_000295>
PREFIX mw: <http://semanticscience.org/resource/CHEMINF_000216>

  SELECT DISTINCT ?mol ?molLabel ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl ?mw ?PSA ?logP WITH {
    SELECT DISTINCT ?mol WHERE {
      ?mol wdt:P31/wdt:P279* wd:Q66587127 .
    } LIMIT 500
  } AS %result
  WITH {
    SELECT ?mol ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl WHERE {
      INCLUDE %result
      OPTIONAL { ?mol wdt:P235 ?InChIKey }
      OPTIONAL { ?mol wdt:P2067 ?mass }
      VALUES ?ChEMBLIDdir { wdt:P592 }
      ?mol ?ChEMBLIDdir ?ChEMBL .
      OPTIONAL {
        ?ChEMBLIDpred wikibase:directClaim ?ChEMBLIDdir .
        ?ChEMBLIDpred wdt:P1921 ?ChEMBLformatterurl .
      }
      BIND(IRI(REPLACE(?ChEMBLformatterurl, '\\$1', str(?ChEMBL))) AS ?ChEMBLUrl).
    }
  } AS %nextresult
  WITH {
    SELECT ?mol ?InChIKey ?mass ?ChEMBL ?ChEMBLUrl ?mw ?PSA ?logP WHERE {
      INCLUDE %nextresult
      SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
        ?ChEMBLUrl hasAttribute: ?prop1 .
        ?prop1 a ?prop1Type ; hasValue: ?mw .
        ?prop1Type rdfs:subClassOf mw: .
        ?ChEMBLUrl hasAttribute: ?prop2 .
        ?prop2 a ?prop2Type ; hasValue: ?PSA .
        ?prop2Type rdfs:subClassOf psa: .
        ?ChEMBLUrl hasAttribute: ?prop3 .
        ?prop3 a ?prop3Type ; hasValue: ?logP .
        ?prop3Type rdfs:subClassOf logP: .
      }
    }
  } AS %finalresult
  WHERE {
    INCLUDE %finalresult
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  }
Chris-Evelo commented 5 years ago

It turns out that in its current form this doesn't have a mapping problem because WikiData does the mapping itself (which is in fact a nice way to solve the problem and should be part of a write-up). As soon as we add another layer (like "for this group of chemically related compounds, tell me which ones are oxidoreductase inhibitors and ..." you do run into mapping and even lenses (what are the active stereoisomers) problems.

Chris-Evelo commented 5 years ago

Also, we could use this as a lead example for the CRS development when we ask for: "which of the compounds that contain these substructures are oxido-reductase inhibitors and ..."

egonw commented 5 years ago

So, if we start with EC enzyme numbers (for oxidoreductases) and then go to ChEMBL and use the experimental data there, we have something like the following, but it doesn't really scale:

PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX hasValue: <http://semanticscience.org/resource/SIO_000300>
PREFIX hasAttribute: <http://semanticscience.org/resource/SIO_000008>
PREFIX psa: <http://semanticscience.org/resource/CHEMINF_000307>
PREFIX logP: <http://semanticscience.org/resource/CHEMINF_000295>
PREFIX mw: <http://semanticscience.org/resource/CHEMINF_000216>

SELECT DISTINCT ?molecule ?mw ?PSA ?logP WITH {
  SELECT DISTINCT ?UniProtUrl WHERE {
    ?protein wdt:P591 ?ecnumber ; wdt:P702 [] ; wdt:P352 ?uniprot .
    FILTER (STRSTARTS(?ecnumber, "1."))
    VALUES ?UniProtIDdir { wdt:P352 }
    ?protein ?UniProtIDdir ?uniprot .
    ?UniProtIDpred wikibase:directClaim ?UniProtIDdir ;
                   wdt:P1921 ?UniProtformatterurl .
    BIND(IRI(REPLACE(?UniProtformatterurl, '\\$1', str(?uniprot))) AS ?UniProtUrl).
  } LIMIT 1
} AS %results
WITH {
  SELECT DISTINCT ?molecule WHERE {
    INCLUDE %results
    SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
      ?activity a cco:Activity ;
        cco:hasMolecule ?molecule ;
        cco:hasAssay/cco:hasTarget/cco:hasTargetComponent/cco:targetCmptXref ?UniProtUrl ;
        cco:pChembl ?pchembl .
      FILTER (?pchembl > 8)
    }
  } LIMIT 50
} AS %nextresults
WHERE {
  SELECT DISTINCT ?molecule ?mw ?PSA ?logP WHERE {
    INCLUDE %nextresults
    SERVICE <https://www.ebi.ac.uk/rdf/services/sparql> {
      ?molecule hasAttribute: ?prop1 .
      ?prop1 a ?prop1Type ; hasValue: ?mw .
      ?prop1Type rdfs:subClassOf mw: .
      ?molecule hasAttribute: ?prop2 .
      ?prop2 a ?prop2Type ; hasValue: ?PSA .
      ?prop2Type rdfs:subClassOf psa: .
      ?molecule hasAttribute: ?prop3 .
      ?prop3 a ?prop3Type ; hasValue: ?logP .
      ?prop3Type rdfs:subClassOf logP: .
    }
  } LIMIT 50
}
AlasdairGray commented 5 years ago

I can now recreate the single complex query as a series of small queries. However, we encounter the query limits on wikidata which only permit a query every 60s. Even with putting a sleep 60s in the for loop I'm hitting the timeout, with 2 (4th & 10th) of my 10 requests being rejected. This already took 10 minutes to run.

Will need to investigate if we can give a list of values to use in the query or think of an alternative. @egonw @pgroth any thoughts?

egonw commented 5 years ago

@AlasdairGray, this page is relevant in this contect: https://www.wikidata.org/wiki/Wikidata:WikiProject_Limits_of_Wikidata

AlasdairGray commented 5 years ago

I was more thinking could we pass a list in using the VALUES feature. I'll need to investigate whether that is possible both in terms of SPARQL and grlc.

Any other thoughts, or do we have to make the query block larger, i.e. include the inhibitor type as the parameter but then return the chemical information? We are then likely to run into a similar problem for pulling the data from the EBI RDF platform (limits.

Chris-Evelo commented 5 years ago

I would discuss this with Andra Waagmeester. They may have solved problems like this before.

AlasdairGray commented 5 years ago

Using the VALUES approach I can include a list of identifiers. I will now need to work out how to do this using grlc for the REST API.

I don't see an obvious way of doing this from the documentation. I'm thinking that I'll need to define a new string parameter which would then include the values that are to be passed in. My concern is that these don't get processed correctly. I'll create a new play query and try a few things out.

egonw commented 5 years ago

Ah, yes, sure. Please use VALUES to query these properties for as many compounds at the same time. Using VALUES for asking the three properties is harder, as they would end up as separate rows, tho Finn actually has a trick for that in Scholia too.