Export all federated queries to create a real-world benchmark for federated queries

vemonet commented 1 month ago

This repository contains a lot of complex federated queries to large endpoints.

It would be interesting to provide some instructions to easily export all federated queries to constitute a benchmark that could be used by federated query systems.

Another comparable benchmark would be: https://github.com/dice-group/LargeRDFBench

But this benchmark would provide queries that are actually used in the real world.

constraintAutomaton commented 3 weeks ago

I made this script to extract the queries @vemonet .

https://github.com/constraintAutomaton/sib-swiss-federated-query-extractor

I changed the queries provided in the repo because they do not seem to work with the data model. I used this one instead.

PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX spex: <https://purl.expasy.org/sparql-examples/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?queryID ?federatedEndpoint ?comment ?query ?target  WHERE {
  ?queryID sh:select ?query .
  ?queryID spex:federatesWith ?federatedEndpoint .
  ?queryID rdfs:comment ?comment .
  ?queryID <https://schema.org/target> ?target
}

At least on my side no queries had more than one <https://schema.org/target> and spex:federatesWith seems to be matching the number of endpoint in the federation.

Query edited because I was getting the ones where the federation was at least 3 instead of 2.

constraintAutomaton commented 3 weeks ago

Maybe, I can document how I've done it and provide my repo as an example, after some cleanup. Unless, I made a mistake somewhere.

vemonet commented 3 weeks ago

Thanks @constraintAutomaton that's nice! A few remarks:

You forgot to also add the endpoint URL of the main endpoint on which the query is expected to run
It would be better to put all queries under a specific key, so we can directly iterate over them without having to filter out the metadata key
It seems like you are using the old convertToOneTurtle.sh bash script to compile all queries (https://github.com/constraintAutomaton/sib-swiss-federated-query-extractor/blob/main/init.sh), I would recommend to use the sparql-examples-utils.jar like documented in the README.md
This one is more of a detail but maybe use federatesWith instead of federatedEndpoint, to make it more consistent with the currently used predicate

Something a bit like:


{
  "queries": [ 
    {
    "uri": "https://www.bgee.org/sparql/.well-known/sparql-examples/020",
    "endpoint": "https://www.bgee.org/sparql/",
    "query": "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\nPREFIX up: <http://purl.uniprot.org/core/>\nPREFIX genex: <http://purl.org/genex#>\nPREFIX obo: <http://purl.obolibrary.org/obo/>\nPREFIX orth: <http://purl.org/net/orth#>\nPREFIX dcterms: <http://purl.org/dc/terms/>\nPREFIX sio: <http://semanticscience.org/resource/>\n\nSELECT DISTINCT ?flyEnsemblGene ?orthologTaxon ?orthologEnsemblGene ?orthologOmaLink WHERE {\n\t{\n        SELECT DISTINCT ?gene ?flyEnsemblGene {\n        ?gene a orth:Gene ;\n            genex:isExpressedIn/rdfs:label 'eye' ;\n            orth:organism/obo:RO_0002162 ?taxon ;\n            dcterms:identifier ?flyEnsemblGene .\n        ?taxon up:commonName 'fruit fly' .\n        } LIMIT 100\n    }\n    SERVICE <https://sparql.omabrowser.org/sparql> {\n        ?protein2 a orth:Protein .\n        ?protein1 a orth:Protein .\n        ?clusterPrimates a orth:OrthologsCluster .\n        ?cluster a orth:OrthologsCluster ;\n            orth:hasHomologousMember ?node1 ;\n            orth:hasHomologousMember ?node2 .\n        ?node1 orth:hasHomologousMember* ?protein1 .\n        ?node2 orth:hasHomologousMember* ?clusterPrimates .\n        ?clusterPrimates orth:hasHomologousMember* ?protein2 .\n        ?protein1 sio:SIO_010079 ?gene . # is encoded by\n        ?protein2 rdfs:seeAlso ?orthologOmaLink ;\n            orth:organism/obo:RO_0002162 ?orthologTaxonUri ;\n            sio:SIO_010079 ?orthologGene . # is encoded by\n        ?clusterPrimates orth:hasTaxonomicRange ?taxRange .\n        ?taxRange orth:taxRange 'Primates' .\n        FILTER ( ?node1 != ?node2 )\n    }\n    ?orthologTaxonUri up:commonName ?orthologTaxon .\n    ?orthologGene dcterms:identifier ?orthologEnsemblGene .\n}",
    "description": "Which are the genes in Primates orthologous to a gene that is expressed in the fruit fly's eye?",
    "federatesWith": [
      "https://www.bgee.org/sparql/",
      "https://sparql.omabrowser.org/sparql"
    ],
    }
    ...
  ],
  "metadata": ...
  },

constraintAutomaton commented 2 weeks ago

Thanks @vemonet! I've made the changes.

sib-swiss / sparql-examples

Export all federated queries to create a real-world benchmark for federated queries #40