usc-isi-i2 / kgtk-notebooks

Tutorial and hands-on notebook on using the Knowledge Graph Toolkit (KGTK)
MIT License
78 stars 25 forks source link

SPARQL queries #7

Open GiorgioBarnabo opened 2 years ago

GiorgioBarnabo commented 2 years ago

Hi everybody,

thanks again for this very cool project. Is there a way to perform SPARQL queries on your system? Wikidata query service is indeed very powerful, but it often time-outs. Here is a wikidata query example that I would like to perform with kgtk.

Thanks. Best,

Giorgio

dgarijo commented 2 years ago

Hi @GiorgioBarnabo, there is no support for SPARQL yet. We use Kypher, which supports Cypher queries

szeke commented 2 years ago

@grantxie, please see if you can write an equivalent Kypher query for this SPARQL query (ignore the language part, do it only for English first)

SELECT 
?entity 
?entityName 
?entityDescription 
#?author
?authorName
(GROUP_CONCAT(DISTINCT ?countryLabel;separator=", ") AS ?countries)
(GROUP_CONCAT(DISTINCT ?themeLabel;separator=", ") AS ?themes) 
(MIN(YEAR(?publication_date)) AS ?date) 
?number_wikipedia_pages
WHERE {
  ?entity wdt:P7937/wdt:P279* wd:Q25379 .
  ?entity wikibase:sitelinks ?number_wikipedia_pages.
  FILTER(?number_wikipedia_pages > 5).
  OPTIONAL{?entity wdt:P50 ?author.}
  OPTIONAL{?entity wdt:P921 | wdt:P136 ?theme.}
  OPTIONAL{?entity wdt:P495 ?country.}
  OPTIONAL{?entity wdt:P577 ?publication_date.}
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en, fr, es, it".
                         ?entity rdfs:label ?entityName . 
                         ?entity schema:description ?entityDescription .
                         ?author rdfs:label ?authorName .
                         ?theme rdfs:label ?themeLabel .
                         ?country rdfs:label ?countryLabel .}
} GROUP BY ?entity ?entityName ?entityDescription ?author ?authorName ?number_wikipedia_pages
GrantXie commented 2 years ago

Ok I'll take a look

GiorgioBarnabo commented 2 years ago

Thank you so much guys! I really appreciate that ;)

GrantXie commented 2 years ago

It would be something like this

kgtk(""" query -i label -i item -i link --match 'item:(work)-[:P279]->(:Q25379), label:(work)-[]->(work_label), link:(work)-[]->(wiki_num)' --opt 'item:(work)-[:P50]->(author)' --opt 'label:(author)-[]->(author_name)' --opt 'item:(work)-[:P136]->(genre)' --opt 'label:(author)-[]->(genre_name)' --opt 'item:(work)-[:P921]->(ms)' --opt 'label:(ms)-[]->(ms_name)' --where 'wiki_num > 5' --return 'work as entity, work_label as entityName, author as author, author_name as author_name, genre_name as genre, ms_name as main_subject' --limit 3""")

item and label file as usual

!kgtk query -i claims.wikibase-item.tsv.gz --as item --limit 3 !kgtk query -i labels.en.tsv.gz --as label --limit 3

and you also need a link file (qnode as node1 and count_wiki_page as node2, label doesn't matter)

GiorgioBarnabo commented 2 years ago

Dear @GrantXie,

first of all, thank you very very much! This is so cool! I might need some more help though. I tried to run the query on colab but I was not able to obtain what I need. I have started reading the documentation, but maybe is there some shortcut I can take? I am trying to hack the "02-kg-profiling.ipynb" colab in order to run my SPARQL/your Cyper query.

Schermata 2022-03-01 alle 09 41 47

Any more advice?

Thank you again. Best,

Giorgio