Currently we create the CSV version of the corpus with a complex SPARQL query. This query takes around 5 minutes. In the past we have already refactored the query to be more efficient with growing needs and data e.g. by avoiding many OPTIONAL statements or by query the contributors separately and merge afterwards #223.
However, the query remains a bottleneck. We can query the different properties with separate queries in a quick way (avoiding OPTIONAL statements completely) and perform the grouping afterwards with a Python script.
Additionally this would give us smaller queries and results that can easier be debugged than one large monolithic query.
Currently we create the CSV version of the corpus with a complex SPARQL query. This query takes around 5 minutes. In the past we have already refactored the query to be more efficient with growing needs and data e.g. by avoiding many
OPTIONAL
statements or by query the contributors separately and merge afterwards #223.However, the query remains a bottleneck. We can query the different properties with separate queries in a quick way (avoiding OPTIONAL statements completely) and perform the grouping afterwards with a Python script.
Additionally this would give us smaller queries and results that can easier be debugged than one large monolithic query.