Open dssib opened 2 months ago
If we go this road we might want to adapt the "standard" used by https://grlc.io: add a _
after the ?
of the templated variable, e.g. ?_specie_iri
This should not break any of the existing tests (since the templated variable will be considered as a regular variable by parsers)
But we would need to figure out some predicate that point to the query used to populate the templated variable. And an intermediary object that link the SPARQL query used for completion to the variable ID (in my example ?_specie_iri
)
Example templated enumeration query with grlc: https://github.com/CLARIAH/grlc-queries/blob/master/enumerate.rq
Complete example of what it could look like, ex:001
would be in its own separated file
@prefix ex: <https://www.bgee.org/sparql/.well-known/sparql-examples/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <https://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
ex:030 a sh:SPARQLExecutable,
sh:SPARQLSelectExecutable ;
rdfs:comment "Anatomical entities for ?species at the young adult developmental stage"@en ;
sh:prefixes _:sparql_examples_prefixes ;
sh:select """PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX genex: <http://purl.org/genex#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT DISTINCT ?anatomicalEntity ?stageName {
?condition genex:hasAnatomicalEntity ?anatEntity ;
genex:hasDevelopmentalStage ?stage ;
obo:RO_0002162 ?species .
?anatEntity rdfs:label ?anatomicalEntity .
?stage rdfs:label ?stageName .
FILTER ( CONTAINS(lcase(?stageName), "young adult") )
}""" ;
schema:target <httpspeciess://www.bgee.org/sparql/> .
ex:030_001 a ex:TemplatedQueryLink ;
ex:templatedQuery ex:030 ;
ex:variableInTemplatedQuery "?species" ;
ex:getValueFromDatasourceQuery ex:001 ;
ex:variableInDatasourceQuery "?species" ;
ex:labelVariableInDatasourceQuery "?commonName" .
ex:001 a sh:SPARQLExecutable,
sh:SPARQLSelectExecutable ;
rdfs:comment "What are the species present in Bgee?"@en ;
sh:prefixes _:sparql_examples_prefixes ;
sh:select """PREFIX up: <http://purl.uniprot.org/core/>
SELECT ?species ?commonName WHERE {
?species a up:Taxon ;
up:rank up:Species ;
up:commonName ?commonName .
}""" ;
schema:target <https://sparql.uniprot.org/sparql/>,
<https://www.bgee.org/sparql/>, <https://sparql.omabrowser.org/sparql/> .
labelVariableInDatasourceQuery
would be optional (used so that we can show a human readable label to the user in the template, but still letting us do the matching on URIs in the back)
This could still be tested with current code, we just need to add a shape to test for the ex:TemplatedQueryLink
. If we want to be thorough we could add some custom checks to see if the ex:variableInTemplatedQuery
, ex:variableInDatasourceQuery
and ex:labelVariableInDatasourceQuery
are actually present in the targeted queries.
Feel free to propose better options for the type/predicates :)
If we go this road we might want to adapt the "standard" used by https://grlc.io: add a
_
after the?
of the templated variable, e.g.?_specie_iri
This should not break any of the existing tests (since the templated variable will be considered as a regular variable by parsers)
But we would need to figure out some predicate that point to the query used to populate the templated variable. And an intermediary object that link the SPARQL query used for completion to the variable ID (in my example
?_specie_iri
)Example templated enumeration query with grlc: https://github.com/CLARIAH/grlc-queries/blob/master/enumerate.rq
IMHO, the critical point is that queries must remain valid SPARQL syntax in any case. This precludes syntax extension, e.g. \$\$VARIABLE. The ?_
notation is ok in this respect. But, the ?_
notation forces us to decide which variable could/must be replaced at the time of writing the query, which may not scale well while composing many queries together. Hence I would not request the ?_
notation to be mandatory.
rdfs:comment "Anatomical entities for ?species at the young adult developmental stage"@en ;
This is not an ordinary comment as ?species
is meant to be interpreted. I would create a new property to account for this, say
ex:commentToBeInterpretedBySIBTools rdfs:subPropertyOf rdfs:comment .
In general I am of the opinion that one should be extra careful in recycling existing vocabulary.
In doubt, new classes and properties should be created, relying on rdfs:subClassOf
and rdfs:subPropertyOf
or their owl friends.
ex:030_001 a ex:TemplatedQueryLink ; ex:templatedQuery ex:030 ; ex:variableInTemplatedQuery "?species" ; ex:getValueFromDatasourceQuery ex:001 ; ex:variableInDatasourceQuery "?species" ; ex:labelVariableInDatasourceQuery "?commonName" .
mmmh
ex:030_001 a ex:TemplatedQueryLink ;
ex:templatedQuery ex:030 ;
ex:templatedVariable [
ex:variableInTemplatedQuery "?species" ;
ex:getValueFromDatasourceQuery ex:001 ;
ex:variableInDatasourceQuery "?species" ; # to serve as a unique key, not visible in the UI
ex:labelVariableInDatasourceQuery "?commonName" . # visible to the enduser, and possibly include some HTML
],
[ ... another independant variable ]
The case of dependent variables might be considered: Referring different variables from the same SPARL query will restrict the allowed combinations.
I think it would be better to distinguish such template queries from the others. Then we could define a new type ex:SPARQLExecutableTemplate subclass of sh:SPARQLExecutable. I would also vote for using "template" instead of templated (since, originally, template is not a verb but a noun).
I think it would be better to distinguish such template queries from the others. Then we could define a new type ex:SPARQLExecutableTemplate subclass of sh:SPARQLExecutable. I would also vote for using "template" instead of templated (since, originally, template is not a verb but a noun).
As long as the syntax of SPARQL queries is not "extended", I don't see a good reason to distinguish template queries from "regular" queries. A regular query from today may become a template query of tomorrow, without any change.
On the other hand, I am also in favour of defining a new dedicated type, a subclass of sh:SPARQLExecutable. It can be seen as a preventive measure not to mess up with later use of shacl in the same triplestore.
The reason is that both are of different nature. for example, the question (description) would have a variable for a template ( "Anatomical entities in ?species...") what should not be the case for a traditional use of a question answer system. Therefore, it might require some extra preprocessing to get them right. Moreover, IMO they are different use cases (that can be complementary) - pairs of questions and queries, and query templates and question templates, then a clear distinction will be better for this reason. For instance, from one template query severals questions/SPARQL queries can be derived, etc , Even a template query can be "templated" differently. For the example above , "Anatomical entities for ?species at the young adult developmental stage" or "Anatomical entities for ?species at developmental ?stage.", the latter with two variables.
It would be useful to be able to include example SPARQL templated queries, allowing contributors to point to queries that can populate templates (e.g. species names), similar to what we use in the BioQuery interface: https://biosoda.expasy.org/bioquery-dbgi/?search=Q2