weso / weso-scholia

Collaboration between Scholia and WESO
GNU General Public License v2.0
3 stars 3 forks source link

Caching of results #15

Open ExarcaFidalgo opened 3 years ago

ExarcaFidalgo commented 3 years ago

A possibility yet to explore was the caching of partial (subquery-related) or total results.

As discussed in #9 , the undeterministic nature of the subqueries causes inconsistencies when making use of partial querying. Ordering the subsetting beforehand solved such issue, an unacceptable increase in response times as a drawback.

One suggestion was to periodically store the ordered subsetting in order to achieve its deterministic features without downsides. Hereunder, we explore the capabilities of Blazegraph with such goals in mind.

Unfortunately, the insertion of data into named solution sets does not allow for a ORDER BY in the top-level clause. The closest that may be achieved is the following:

INSERT INTO %test1
SELECT ?author WHERE {
    SELECT DISTINCT ?author WHERE {
      SERVICE <https://query.wikidata.org/sparql> {
          ?author wdt:P27 | wdt:P1416/wdt:P17 | wdt:P108/wdt:P17 wd:Q32 .
      }
    }
    ORDER BY ?author
}

Which alters the order ever so slightly. Anyway, when recovering the data from said stored solution set:

SELECT ?author
  WHERE {
    INCLUDE %test1 
  }

We will find that it is indeed not ordered in its entirety.

<http://www.wikidata.org/entity/Q13103414>
<http://www.wikidata.org/entity/Q13103415>
<http://www.wikidata.org/entity/Q13103417>
<http://www.wikidata.org/entity/Q120975>
<http://www.wikidata.org/entity/Q12152391>
<http://www.wikidata.org/entity/Q12155944>

Nonetheless, we may modify slightly the previous insertion so that it slices a subset, corresponding in size and position to those we use when partial querying.

INSERT INTO %test2
SELECT ?author WHERE {
    SELECT DISTINCT ?author WHERE {
      SERVICE <https://query.wikidata.org/sparql> {
          ?author wdt:P27 | wdt:P1416/wdt:P17 | wdt:P108/wdt:P17 wd:Q32 .
      }
    }
    ORDER BY ?author
    LIMIT 1000
    OFFSET 3000
}

So, by storing the subsetting in multiple solution sets, we would guarantee the consistency of the results, since it is ordered beforehand. The response time is quite solid, over 100 ms.

SELECT ?author
  WHERE {
    INCLUDE %test2
  }