kbrbe / beltrans-data-integration

Creating a FAIR Linked Data corpus for the BELTRANS research project about Belgian book translations NL-FR and FR-NL between 1970 and 2020
https://www.kbr.be/en/projects/beltrans/
MIT License
5 stars 0 forks source link

Improve performance of CSV creation by replacing large monolithic SPARQL query #257

Closed SvenLieber closed 5 months ago

SvenLieber commented 5 months ago

Currently we create the CSV version of the corpus with a complex SPARQL query. This query takes around 5 minutes. In the past we have already refactored the query to be more efficient with growing needs and data e.g. by avoiding many OPTIONAL statements or by query the contributors separately and merge afterwards #223.

However, the query remains a bottleneck. We can query the different properties with separate queries in a quick way (avoiding OPTIONAL statements completely) and perform the grouping afterwards with a Python script.

Additionally this would give us smaller queries and results that can easier be debugged than one large monolithic query.