openphacts / ops-search

Open PHACTS search service
MIT License
5 stars 3 forks source link

Which Drugbank fields to index? #15

Closed ianwdunlop closed 8 years ago

ianwdunlop commented 9 years ago

For drugbank:Drug see https://github.com/openphacts/OPS_LinkedDataApi/blob/1.5.0/api-config-files/01_01_compoundInfo.ttl#L142 for the fields that are returned in the API call. Based on this here is the first version of SPARQL to grab the fields for Elastic Search to index

PREFIX chembl: <http://rdf.ebi.ac.uk/terms/chembl#>
PREFIX obohash: <http://purl.obolibrary.org/obo#>
PREFIX cheminf: <http://semanticscience.org/resource/>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX drugbank: <http://bio2rdf.org/drugbank_vocabulary:>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX oboslash: <http://purl.obolibrary.org/obo/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
 GRAPH <http://www.openphacts.org/bio2rdf/drugbank> {
   { ?id a drugbank:Drug . }
    OPTIONAL { ?id dct:title ?label . }
    OPTIONAL { ?id drugbank:synonym/dct:title ?synonym . }
    OPTIONAL { ?id drugbank:brand/dct:title ?brandName . }
 }
}
LIMIT 100
#limit added during testing

Sample response:

id label synonym brandName
http://bio2rdf.org/drugbank:DB01112 "Cefuroxime"@en "(6R,7R)-3-[(Carbamoyloxy)methyl]-7-{[(2Z)-2-furan-2-yl-2-(methoxyimino)acetyl]amino}-8-oxo-5-thia-1-azabicyclo[4.2.0]oct-2-ene-2-carboxylic acid"@en "Ceftin"@en
http://bio2rdf.org/drugbank:DB01117 "Atovaquone"@en "2-(trans-4-(P-Chlorophenyl)cyclohexyl)-3-hydroxy-1,4-naphthoquinone"@en "Malarone Pediatric"@en
danidi commented 9 years ago

I would allow all those fields to be indexed (but maybe exclude the @en). On the drugbank page http://www.drugbank.ca/drugs/DB01112, the identifiers are also given in other languages, maybe you could include those as well? There are also many more brand names available there. Also, it is good that the Drugbank identifier would find results (maybe not necessarily with the full uri, but just DB and number).

AlasdairGray commented 9 years ago

Is there some way to preserve in the elastic search index the fact that some text is a label, synonym or brand name? That information would be useful for ranking results.

ianwdunlop commented 9 years ago

@AlasdairGray You can boost the result rankings depending on the field type.

ianwdunlop commented 9 years ago

@danidi It needs a really deep dive into the drugbank RDF to figure out what is available. It seems to have changed a lot recently.

stain commented 8 years ago

Now indexing:

  drugbank:
    compound:
      graph: http://linkedlifedata.com/resource/drugbank
      type: drugbank:drugs
      properties:
        - drugbank:brandName
        - drugbank:genericName
        - drugbank:chemicalIupacName
        - drugbank:synonym
        - drugbank:swissprotName
    target:
      graph: http://linkedlifedata.com/resource/drugbank
      type: drugbank:targets
      properties:
        - drugbank:geneName
        - drugbank:synonym
        - drugbank:swissprotName
    enzyme:
      graph: http://linkedlifedata.com/resource/drugbank
      type: drugbank:enzymes
      properties:
        - drugbank:geneName
        - drugbank:name

See also Drugbank properties

danidi commented 8 years ago

@stain do you have an example with actual data? I'm wondering if the swissprotName in drugbank/compound contains the name of the protein the drug is acting on. Do we want to find this?

Also, is drugbank:synonym from compound different to drugbank:synonym from target?