sebferre / sparklis

Sparklis is a query builder in natural language that allows people to explore and query SPARQL endpoints with all the power of SPARQL and without any knowledge of SPARQL.
Apache License 2.0
42 stars 10 forks source link

Query size/Server limitations #6

Closed kad-rowla closed 2 years ago

kad-rowla commented 3 years ago

Thanks for your ongoing support with this editor!

As a follow-on from an earlier issue raised (https://github.com/sebferre/sparklis/issues/3), I see that changing the configuration settings worked well in that case. We are now working on the same deployment (the previous issue was raised by a colleague of mine) but with other API and reducing the maximum number of results down to 100 does not work in this case.

You noted in the last issue that a solution might be to 'add a configuration element to fall back on a simpler query', is this something we can try in this case? I see that the SPARQL query that is being run is, indeed, very large and complex.

The following can be used as a test query: https://bit.ly/3bQD3Lq

Thanks again for your assistance!

sebferre commented 3 years ago

I added a configuration element in "Endpoint and queries" to "Avoid very long queries, which are used to better sample suggestions". When activated, the queries remain large unions of patterns but are nonetheless much simpler. If necessary, use in combination with a smaller number of results.

This seems to work in your test query.

wouterbeek commented 3 years ago

@sebferre Some of these large queries look for literals in the subject position, something like: "some-literal" a ?c

For us the literals are often WKT geo shapes. These can be very large per shape (e.g., the shapes of all Dutch municipalities).

Since literals can never appear in the subject position according to RDF 1.1, maybe such queries should not be sent at all, since will by definition not give any results? Instead, the following SPARQL query extracts the type of a literal:

select (datatype(?literal) as ?class) {
  values ?literal { "some-literal" ... }
}

^ Notice that this query does not require any data from the endpoint, so it might even be possible to run it in Sparklis for better performance.

sebferre commented 3 years ago

The new online version does not anymore put literals in subject position. The problem came from handling all RDF terms alike when querying for class and property suggestions. Thank you for pointing it.

However, this does not solve the problem of very large queries coming from the inclusion of very large literals. Indeed, as literals can be in object position, Sparklis still queries for properties related to them : [] ?ip "some literal". I plan to apply the same solution as in the case of blank nodes, which cannot be injected in queries.

sebferre commented 3 years ago

The last commit filters long literals (len>100) out of the middle suggestions and generated SPARQL queries. I hope this solves satisfactorily your problem. It is difficult to control the queries generated by Sparklis given that there is no assumption on the dataset. It appears that every dataset and every SPARQL engine has its own peculiarities.