Open vemonet opened 1 month ago
I made this script to extract the queries @vemonet .
https://github.com/constraintAutomaton/sib-swiss-federated-query-extractor
I changed the queries provided in the repo because they do not seem to work with the data model. I used this one instead.
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX spex: <https://purl.expasy.org/sparql-examples/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?queryID ?federatedEndpoint ?comment ?query ?target WHERE {
?queryID sh:select ?query .
?queryID spex:federatesWith ?federatedEndpoint .
?queryID rdfs:comment ?comment .
?queryID <https://schema.org/target> ?target
}
At least on my side no queries had more than one <https://schema.org/target>
and spex:federatesWith
seems to be matching the number of endpoint in the federation.
Maybe, I can document how I've done it and provide my repo as an example, after some cleanup. Unless, I made a mistake somewhere.
Thanks @constraintAutomaton that's nice! A few remarks:
metadata
keyconvertToOneTurtle.sh
bash script to compile all queries (https://github.com/constraintAutomaton/sib-swiss-federated-query-extractor/blob/main/init.sh), I would recommend to use the sparql-examples-utils.jar
like documented in the README.md
federatesWith
instead of federatedEndpoint
, to make it more consistent with the currently used predicateSomething a bit like:
{
"queries": [
{
"uri": "https://www.bgee.org/sparql/.well-known/sparql-examples/020",
"endpoint": "https://www.bgee.org/sparql/",
"query": "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\nPREFIX up: <http://purl.uniprot.org/core/>\nPREFIX genex: <http://purl.org/genex#>\nPREFIX obo: <http://purl.obolibrary.org/obo/>\nPREFIX orth: <http://purl.org/net/orth#>\nPREFIX dcterms: <http://purl.org/dc/terms/>\nPREFIX sio: <http://semanticscience.org/resource/>\n\nSELECT DISTINCT ?flyEnsemblGene ?orthologTaxon ?orthologEnsemblGene ?orthologOmaLink WHERE {\n\t{\n SELECT DISTINCT ?gene ?flyEnsemblGene {\n ?gene a orth:Gene ;\n genex:isExpressedIn/rdfs:label 'eye' ;\n orth:organism/obo:RO_0002162 ?taxon ;\n dcterms:identifier ?flyEnsemblGene .\n ?taxon up:commonName 'fruit fly' .\n } LIMIT 100\n }\n SERVICE <https://sparql.omabrowser.org/sparql> {\n ?protein2 a orth:Protein .\n ?protein1 a orth:Protein .\n ?clusterPrimates a orth:OrthologsCluster .\n ?cluster a orth:OrthologsCluster ;\n orth:hasHomologousMember ?node1 ;\n orth:hasHomologousMember ?node2 .\n ?node1 orth:hasHomologousMember* ?protein1 .\n ?node2 orth:hasHomologousMember* ?clusterPrimates .\n ?clusterPrimates orth:hasHomologousMember* ?protein2 .\n ?protein1 sio:SIO_010079 ?gene . # is encoded by\n ?protein2 rdfs:seeAlso ?orthologOmaLink ;\n orth:organism/obo:RO_0002162 ?orthologTaxonUri ;\n sio:SIO_010079 ?orthologGene . # is encoded by\n ?clusterPrimates orth:hasTaxonomicRange ?taxRange .\n ?taxRange orth:taxRange 'Primates' .\n FILTER ( ?node1 != ?node2 )\n }\n ?orthologTaxonUri up:commonName ?orthologTaxon .\n ?orthologGene dcterms:identifier ?orthologEnsemblGene .\n}",
"description": "Which are the genes in Primates orthologous to a gene that is expressed in the fruit fly's eye?",
"federatesWith": [
"https://www.bgee.org/sparql/",
"https://sparql.omabrowser.org/sparql"
],
}
...
],
"metadata": ...
},
Thanks @vemonet! I've made the changes.
This repository contains a lot of complex federated queries to large endpoints.
It would be interesting to provide some instructions to easily export all federated queries to constitute a benchmark that could be used by federated query systems.
Another comparable benchmark would be: https://github.com/dice-group/LargeRDFBench
But this benchmark would provide queries that are actually used in the real world.