openphacts / GLOBAL

Global project issues [private for now. owner lee harland]
3 stars 0 forks source link

No results for GO target classifications #395

Open ianwdunlop opened 7 years ago

ianwdunlop commented 7 years ago

Reported by @jakhag. For example, http://alpha.openphacts.org:3002/target/classifications?uri=http%3A%2F%2Fpurl.uniprot.org%2Funiprot%2FP14756&app_id=f91c5b2b&tree=go

Some sort of query issue. The query is trying to find chembl (and also Enzyme) things from the go graph. Here is a subset of the query which highlights the issue:

PREFIX chembl: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct * WHERE {
  VALUES ?g {
    <http://www.geneontology.org> 
  }
  {
    VALUES ?chembl_target_uri {
      <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL1075146>
    } GRAPH <http://www.ebi.ac.uk/chembl> {
      ?chembl_target_uri chembl:hasProteinClassification ?chembl_class ;
                         dcterms:title ?target_name ;
                         a ?target_type .
      GRAPH ?g {
        ?chembl_class rdfs:label ?chembl_label
      }
    } 
  }
}

If you remove this bit

  VALUES ?g {
    <http://www.geneontology.org> 
  }

then you get results. Whether the results are correct or not is another question. Why it tries to to find chembl class labels from the GO graph I have no idea. Did this work in previous versions of the API? Is there data missing from the GO graph ie chembl class URI with a label? Should we add that data in there?

danidi commented 7 years ago

A 404 usually means there is no data available, but there should be GO classifications for this protein (see http://www.ebi.ac.uk/QuickGO/GProtein?ac=P14756). But also on 2.1 there is no data returned. The target/classifications API call retrieves classifications from three different sources, enzyme, chembl and go. So it should search for chembl labels for the chembl classification (and you see them if you remove the tree=go filter from the call), but I don't think there should be chembl labels for GO.

While searching with a different target, I saw that the call works fine (e.g. http://alpha.openphacts.org:3002/target/classifications?uri=http%3A%2F%2Fpurl.uniprot.org%2Funiprot%2FQ9Y5Y9&app_id=f91c5b2b&tree=go). So it seems to be a an issue with this specific protein. Either it is not in the RDF, or the mapping is missing (I didn't check any of it).

ianwdunlop commented 7 years ago

You do get results but I'm still not sure they are completely "correct". If you look at the chembl target for that protein and put it in the query above (with the GO graph) you don't get any results for it.

PREFIX chembl: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct * WHERE {
  VALUES ?g {
    <http://www.geneontology.org> 
  }
  {
    VALUES ?chembl_target_uri {
      <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL5451>
    } GRAPH <http://www.ebi.ac.uk/chembl> {
      ?chembl_target_uri chembl:hasProteinClassification ?chembl_class ;
                         dcterms:title ?target_name ;
                         a ?target_type .
      GRAPH ?g {
        ?chembl_class rdfs:label ?chembl_label
      }
    } 
  }
}

As far as I can figure out you don't get any chembl target class, compound class etc if you leave the GO graph enabled. It's because of the UNION clause in the original query that you get anything back for Q9Y5Y9. If you are really (and I mean really) interested than here is the original query:

PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX chembl: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX ops: <http://www.openphacts.org/api#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX obo: <http://www.semantic-systems-biology.org/ontology/rdf/OBO#>
CONSTRUCT {
  <http://purl.uniprot.org/uniprot/Q9Y5Y9> skos:exactMatch ?chembl_target_uri ;
    skos:exactMatch ?uniprot_target_uri .
  ?chembl_target_uri chembl:hasProteinClassification ?chembl_class ;
    dcterms:title ?target_name ;
    a ?target_type ;
    void:inDataset <http://www.ebi.ac.uk/chembl> .
  ?uniprot_target_uri ops:hasGoComponent ?go_component ;
    ops:hasGoFunction ?go_function ;
    ops:hasGoProcess ?go_process ;
    ops:hasEnzymeClassification ?enzyme_class ;
    skos:prefLabel ?uniprot_name ;
    void:inDataset ?dataset .
  ?chembl_class skos:prefLabel ?chembl_label ;
    void:inDataset <http://www.ebi.ac.uk/chembl> .
  ?enzyme_class skos:prefLabel ?enzyme_label ;
    void:inDataset <http://purl.uniprot.org/enzyme> .
  ?go_component skos:prefLabel ?go_c_label ;
    void:inDataset <http://www.geneontology.org> .
  ?go_function skos:prefLabel ?go_f_label ;
    void:inDataset <http://www.geneontology.org> .
  ?go_process skos:prefLabel ?go_p_label ;
    void:inDataset <http://www.geneontology.org> .
  <http://purl.uniprot.org/enzyme> skos:prefLabel 'Enzyme Classification' .
  <http://www.ebi.ac.uk/chembl/target> skos:prefLabel 'ChEMBL Target Hierarchy' .
  <http://www.geneontology.org> skos:prefLabel 'GeneOntology' .
}  WHERE {
  VALUES ?g {
    <http://www.geneontology.org> 
  } {
    VALUES ?chembl_target_uri {
      <http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL5451>  
    } GRAPH <http://www.ebi.ac.uk/chembl> {
      ?chembl_target_uri chembl:hasProteinClassification ?chembl_class ;
                         dcterms:title ?target_name ;
                         a ?target_type .
      GRAPH ?g {
        ?chembl_class rdfs:label ?chembl_label
      }
    } 
  } UNION {
    VALUES ?uniprot_target_uri {
      <http://purl.uniprot.org/uniprot/Q9Y5Y9>  
    } GRAPH <http://www.openphacts.org/goa> {
      {
        GRAPH <http://purl.uniprot.org> {
          ?uniprot_target_uri uniprot:enzyme|uniprot:domain/uniprot:enzyme ?enzyme_class .
          BIND(<http://purl.uniprot.org> AS ?dataset)
        }
        GRAPH ?g {
          ?enzyme_class skos:prefLabel ?enzyme_label
        } 
      } UNION
      {
        ?uniprot_target_uri obo:C ?go_component
        BIND(<http://www.openphacts.org/goa> AS ?dataset)
        GRAPH ?g {
          ?go_component rdfs:label ?go_c_label
        } 
      } UNION
      {
        ?uniprot_target_uri obo:F ?go_function
        BIND(<http://www.openphacts.org/goa> AS ?dataset)
        GRAPH ?g {
          ?go_function rdfs:label ?go_f_label
        } 
      } UNION
      {
        ?uniprot_target_uri obo:P ?go_process
        BIND(<http://www.openphacts.org/goa> AS ?dataset)
        GRAPH ?g {
          ?go_process rdfs:label ?go_p_label
        } 
      }
      GRAPH <http://purl.uniprot.org> {
        ?uniprot_target_uri rdfs:label ?uniprot_name
      }
    }
  } 
} LIMIT 10

with a LIMIT just in case. Run that query on the sparql endpoint with and without the graph clause to see what I mean.