jackrusher / mundaneum

A clojure wrapper around WikiData
BSD Zero Clause License
127 stars 16 forks source link

Federated query? #12

Closed zachcp closed 2 years ago

zachcp commented 4 years ago

Hi Jack,

Nice updates on Mundaneum. Any chance I could use Mundaneum to combine Wikidata queries with other SPARQL sources, like, say , Uniprot? Below is an example of what I'd like to do where I take your recent example and attempt to retrieve the underlying AminoAcid sequences via the Uniprot Ids and the Uniprot Sqarql endpoint.

zach cp

;; Attempt at federated query
;
; https://www.wikidata.org/wiki/User:ProteinBoxBot/SPARQL_Examples
; https://www.wikidata.org/wiki/User:ProteinBoxBot/SPARQL_Examples#Wikidata_-%3E_Wikipathways
; https://www.wikidata.org/wiki/User:ProteinBoxBot/SPARQL_Examples#Uniprot_-%3E_Wikidata
; Uniprot Named Graph: https://sparql.uniprot.org/uniprot

(use '[mundaneum.query    :refer [describe entity label property query stringify-query *default-language*]])
(use '[backtick           :refer [template]])

; get gene name and gene product
(query
  '[:select ?drugLabel ?geneLabel ?diseaseLabel ?gene_product ?gene_productLabel
    :where [[?drug (wdt :physically-interacts-with) ?gene_product]
            [?gene_product (wdt :encoded-by) ?gene]]
    :limit 10])

;[{:gene_product "Q287958", :drugLabel "labetalol", :geneLabel "ADRB1", :gene_productLabel "Adrenoceptor beta 1"}
; {:gene_product "Q287958", :drugLabel "mirabegron", :geneLabel "ADRB1", :gene_productLabel "Adrenoceptor beta 1"}
; ....
; {:gene_product "Q287961", :drugLabel "epinephrine", :geneLabel "ADRB2", :gene_productLabel "Adrenoceptor beta 2"}]

; Get the Uniprot Ids of the protein
;
(query
  '[:select ?drugLabel ?geneLabel ?diseaseLabel ?gene_product ?gene_productLabel ?uniprot ?uniprotLabel
    :where [[?drug (wdt :physically-interacts-with) ?gene_product]
            [?gene_product (wdt :UniProt-ID) ?uniprot]
            [?gene_product (wdt :encoded-by) ?gene]]
    :limit 10])

;[{:gene_product "Q258915",
;  :uniprot "P00797",
;  :drugLabel "aliskiren",
;  :geneLabel "REN",
;  :gene_productLabel "Renin",
;  :uniprotLabel "P00797"}
; ...
; {:gene_product "Q283350",
;  :uniprot "P04637",
;  :drugLabel "Hypothetical protein CT_788",
;  :geneLabel "TP53",
;  :gene_productLabel "Tumor protein p53",
;  :uniprotLabel "P04637"}]

; now lets use UNIPROT to retrieve the amino acid sequences for the geneproducts
; i'm stuck here

(stringify-query
  '[:select ?drugLabel ?geneLabel ?diseaseLabel ?gene_product ?aa_sequence
    :where [[?drug (wdt :physically-interacts-with) ?gene_product]
            [?gene_product (wdt :encoded-by) ?gene]
            [?gene_product (wdt :UniProt-ID) ?uniprot]
            :service <http://sparql.uniprot.org/sparql>
            [[?uniprot a <http://purl.uniprot.org/core:Protein> ]
             [?uniprot <http://purl.uniprot.org/core:sequence> ?isoform]
             [?isoform rdf:value ?aa_sequence]]]
    :limit 10])
jackrusher commented 4 years ago

I get no results from either of the samples you're trying to re-encode. It would help me a great deal if you could find some working SPARQL queries (don't worry about re-writing them in the DSL) from which I could start.

zachcp commented 4 years ago

Thanks Jack,

The best (only) example I can find that mix Wikidata and Uniprot SPARQL are from the Sulab (see here). They query the UNIPROT endpoint and pull in data from Wikidata. Maybe we can "invert" the query to be initiated on the wikidata side?

But I think I may try to play around a bit more with SPARQL and maybe I can figure some of the connections myself. Mundaneum smooths out some of the rough edges so it can be an inspiration as I explore these other datasources.

I think the hardest bit is knowing what things are related to what other things and the best way to explore is to build off of preexisting examples (or have some tools that help you navigte the properties you care about). Of the portals I found so far I think Nextprot does the best job of illustrating the power of linked data through its large number of ready-to-go queries.

jackrusher commented 4 years ago

One path to this sort of functionality might be for me to make a general purpose SPARQL wrapper (this library is almost there) with optional extensions for Wikidata. That way, one could use it for any arbitrary SPARQL endpoints, including federated queries. The only downside is that it's a bit more work than I have time for just now.

jackrusher commented 2 years ago

@zachcp Being too busy to write a general SPARQL wrapper has worked in my favor, as someone else has done the work. I am now looking into re-writing this library on top of flint, which should give us the bits we need for Federated Queries. :)

zachcp commented 2 years ago

@jackrusher This seems to have been a wise strategy you've employed!

jackrusher commented 2 years ago

I re-wrote it last night after dinner. New version pushed. Multi-lingual support and federated queries examples TK.

jackrusher commented 2 years ago

Federated query using WikiPathways added to examples.clj :)