vliz-be-opsci / py-RDF-store

module to interact with a memory or uristore
0 stars 0 forks source link

RDFSTORE.select() does not support SPARQL PREFIXES and REGEX #29

Closed cedricdcc closed 7 months ago

cedricdcc commented 7 months ago

For the following test:

def test_select_sparql(rdf_store: RDFStore):
    SPARQL = """
      SELECT DISTINCT ?s WHERE {
      [] <http://www.w3.org/ns/dcat#resource> ?s .
      }
    """

    # TODO: test if with uri encoding it works for other sparql queries

    result = rdf_store.select(SPARQL)
    print(result)

I get this print statement: <rdflib.plugins.sparql.results.xmlresults.XMLResult object at 0x0000020FEDD05490>

For the same test with this sparql:

SPARQL = """
      PREFIX dcat: <http://www.w3.org/ns/dcat#> .
      SELECT DISTINCT ?s WHERE {
      [] <http://www.w3.org/ns/dcat#resource> ?s .
      FILTER(REGEX(STR(?s), "/publication/\\d+"))
      }
    """

I get (400, 'HTTP Error 400: ', None)

This is a bad request. The problem is that this is handled by rdflib

cedricdcc commented 7 months ago

After searching it seems that this is due to the limitations of SPARQL version that the read_uri endpoint uses of graphDB. https://www.phind.com/search?cache=fog56qnjqorj6kfgj0dgt8d8

A possible solution could be to use a url request to the graphdb endpoint itself or bumping the base image of graphDB

cedricdcc commented 7 months ago

Another solution would be to add a check to see if a select query is compliant to the standard of SPARQL that is used by graphDB. For the prefixes a small function can be made to insert them into the sparql query itself.

cedricdcc commented 7 months ago

Another possible route is the initNS variable that can be changed https://stackoverflow.com/questions/76677670/sparql-namespace-conflict-while-querying

marc-portier commented 7 months ago

with the new yasgui component to kgap it shows that the 415 error pops up when performing the sparql reads (select) over the write_uri

read_uri works image

write_uri triggers 415 image

marc-portier commented 7 months ago

for another issue looking into graphdb logs of running tests

and suddenly find this:

org.eclipse.rdf4j.http.server.ClientHTTPException: MALFORMED QUERY: Multiple prefix declarations for prefix 'schema'
    at org.eclipse.rdf4j.http.server.repository.handler.DefaultQueryRequestHandler.getQuery(DefaultQueryRequestHandler.java:185)
    at com.ontotext.graphdb.sesame.handler.GraphDBQueryResultHandler.getQuery(GraphDBQueryResultHandler.java:75)
    at org.eclipse.rdf4j.http.server.repository.handler.AbstractQueryRequestHandler.handleQueryRequest(AbstractQueryRequestHandler.java:70)
    at org.eclipse.rdf4j.http.server.repository.AbstractRepositoryController.handleRequestInternal(AbstractRepositoryController.java:53)
...

this «Multiple prefix declarations for prefix 'schema'» looks like somewhere in the chain something is by default adding the schema: prefix into the sparql-select stack

possible candidates for that (unwanted / unexpected) injection:

so it might just be so that:

marc-portier commented 7 months ago

rdflib is doing this see docs at https://rdflib.readthedocs.io/en/stable/namespaces_and_bindings.html#namespacemanager

we should apply bind_namespaces= "none" consistently accross the store.py to overrule the standard behaviour in rdflib

marc-portier commented 7 months ago

closing this for the moment, as the acknowledged partial problem this raised was solved together with #32 - and fixed in application of #34