netwerk-digitaal-erfgoed / dataset-knowledge-graph

Pipeline that generates the NDE Dataset Knowledge Graph
European Union Public License 1.2
2 stars 0 forks source link

SparqlQuerySelector should take into account duplicate quads #64

Open pmaria opened 7 months ago

pmaria commented 7 months ago

For SPARQL endpoints that do not deduplicate constructed quads, given a dcat:Dataset description with multiple distributions, the SparqlQuerySelector will currently construct a dataset object per distribution instead of correctly constructing a single dataset object with multiple distributions, matching the source data.

To account for this the SparqlQuerySelector should take into account duplicate quads and should keep track of processed datasets.

See also:

ddeboer commented 2 months ago

Looks like deduplication will be solved in Comunica: https://github.com/comunica/comunica/pull/1388.