openbudgets / Code-lists

Code list in fiscal data sets
0 stars 3 forks source link

Codelists appearing twice #27

Closed larjohn closed 6 years ago

larjohn commented 7 years ago

There is this query:


PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xro: <http://purl.org/xro/ns#>

SELECT ?binding_a062e_19056 ?g WHERE {
GRAPH ?g{
    <http://data.openbudgets.eu/resource/dataset/greek-municipalities/codelist/athens-administrations/1> <http://www.w3.org/2004/02/skos/core#prefLabel> ?binding_a062e_19056 .
    }

}
LIMIT 100

that shows the same code list has been uploaded twice. This leads to duplicate results if no GRAPH is selected or DISTINCT is not used.

What is the recommended way to resolve this optimally (DISTINCT is not optimal, by experience)

skarampatakis commented 7 years ago

The original codelist is in the graph http://data.openbudgets.eu/resource/dataset/greek-municipalities/codelist/athens-administrations.

For some reason there seems to be graphs for every instance of this Scheme, with only one instance per graph, and graph name the very same instance IRI. So I think the best approach is to just delete them. And find how these were created and if this happens to other codelists or resources in general.

skarampatakis commented 7 years ago

I found these graphs

with this query

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xro: <http://purl.org/xro/ns#>

SELECT distinct ?g WHERE {
GRAPH ?g {
    ?s ?p ?label .
    }

}

So it seems that for some reason there are graphs containing just on skos:Concept. This happens for one Bonn codelist, the one mentioned earlier and the functional classification of Thessaloniki 2016 at least. So we could just delete graphs whose name is not the same as the ConceptScheme they are referring to. And find how these were created.