kbrbe / beltrans-data-integration

Creating a FAIR Linked Data corpus for the BELTRANS research project about Belgian book translations NL-FR and FR-NL between 1970 and 2020
https://www.kbr.be/en/projects/beltrans/
MIT License
5 stars 0 forks source link

Incomplete query log for data integration queries #265

Closed SvenLieber closed 4 months ago

SvenLieber commented 4 months ago

We store intermediary SPARQL queries when integrating data using the work-set clustering implementation. Those queries are generated based on a config and are executed. SPARQL queries to update properties have a filename that includes the name of the data source as well as the name of the property.

However, we do not take the type of the entity into account. This can lead to an incomplete query log. Imagine we update the property schema:name for KBR persons, the query is stored in property-update-query-KBR-schema_name.sparql and executed. In a later step of the pipeline we update the property schema:name for KBR organizations, the query is stored in property-update-query-KBR-schema_name.sparql and hence overwrites the previous query for persons.

Anyway, usually the data should be correct, because the query is executed immediately after the file is created.