openbudgets / datasets

OpenBudgets.eu datasets
5 stars 3 forks source link

Bonn datasets violate IC - 11 #84

Closed skarampatakis closed 7 years ago

skarampatakis commented 7 years ago

It seems that there are some missing dimension values for some observations. e.g.

Integrity Constraint 11: All Dimensions Required
------------------------------------------------

http://data.openbudgets.eu/resource/dataset/bonn-budget-2017/observation/492942 does not have values for the following dimensions: 
    http://data.openbudgets.eu/ontology/dsd/dimension/fiscalYear
    http://data.openbudgets.eu/ontology/dsd/dimension/budgetPhase
http://data.openbudgets.eu/resource/dataset/bonn-budget-2017/observation/435208 does not have values for the following dimensions: 
    http://data.openbudgets.eu/ontology/dsd/dimension/fiscalYear
    http://data.openbudgets.eu/ontology/dsd/dimension/budgetPhase
http://data.openbudgets.eu/resource/dataset/bonn-budget-2017/observation/21978 does not have values for the following dimensions: 
    http://data.openbudgets.eu/ontology/dsd/dimension/fiscalYear
    http://data.openbudgets.eu/ontology/dsd/dimension/budgetPhase
http://data.openbudgets.eu/resource/dataset/bonn-budget-2017/observation/423480 does not have values for the following dimensions: 
    http://data.openbudgets.eu/ontology/dsd/dimension/fiscalYear
    http://data.openbudgets.eu/ontology/dsd/dimension/budgetPhase
http://data.openbudgets.eu/resource/dataset/bonn-budget-2017/observation/295214 does not have values for the following dimensions: 
    http://data.openbudgets.eu/ontology/dsd/dimension/fiscalYear
    http://data.openbudgets.eu/ontology/dsd/dimension/budgetPhase

tested with NoSPA

fathoni commented 7 years ago

According to the RDF browser, Bonn 2017 has both fiscalYear and budgetPhase. This makes me wonder whether this fiscal year assertion in the DSD is correct

<http://data.openbudgets.eu/ontology/dsd/bonn-budget-2017_2024> a qb:DataStructureDefinition ;
  rdfs:label "Data structure definition for the approved budget of Stadt Bonn (German city)."@en ;  

  qb:component [ qb:dimension obeu-dimension:organization ;
                 qb:componentAttachment qb:DataSet ],
               [ qb:dimension bonn-dimension:operationCharacter ;
                 qb:componentAttachment qb:Slice ],
               [ qb:dimension obeu-dimension:fiscalYear ],
               [ qb:dimension obeu-dimension:budgetPhase ;
                 qb:componentAttachment qb:DataSet ], ...
skarampatakis commented 7 years ago

validation_result_20170622132631.md.zip

larjohn commented 7 years ago

Also have a look here:

https://github.com/openbudgets/rudolf/issues/25

fathoni commented 7 years ago

IC-11 violation has been resolved. However, IC-12 (No Duplicate Observations) is violated. For example, for year 2022, this particular observation is detected as duplicate:

http://data.openbudgets.eu/page/dataset/bonn-budget-2022/observation/12847

When a query is executed using all the dimensions on Bonn 2017-2024 DSD, no other observations are selected.

PREFIX qb: <http://purl.org/linked-data/cube#> 
PREFIX ns0: <http://data.openbudgets.eu/ontology/dsd/measure/> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
PREFIX ns1: <http://data.openbudgets.eu/ontology/dsd/bonn-budget-2017_2024/dimension/> 

SELECT ?s WHERE {?s a qb:Observation ;
  qb:dataSet <http://data.openbudgets.eu/resource/dataset/bonn-budget-2022> ;
  ns1:accountAssignmentElement <http://data.openbudgets.eu/resource/codelist/kontierungselemente_bonn/innenauftraege/4170506> ;
  ns1:administrativeClassification <http://data.openbudgets.eu/resource/codelist/aemterhierarchie_bonn/4170> ;
  ns1:businessArea <http://data.openbudgets.eu/resource/codelist/geschaeftsbereich_bonn/4170> ;
  ns1:economicClassification <http://data.openbudgets.eu/resource/codelist/kostenartenuebersicht_bonn/432100> ;
  ns1:functionalClassification <http://data.openbudgets.eu/resource/codelist/produktuebersicht_bonn/0405> ;
  ns1:operationCharacter <http://data.openbudgets.eu/resource/codelist/bonn-operation-character/revenue> ;
  ns1:profitCenter <http://data.openbudgets.eu/resource/codelist/profitcenter_bonn/141700405> .}

I am unsure if this is a false-positive on NoSPA part, or there is still a mistake on defining the DSD.

validation_result_20170628140632.md.tar.gz

jindrichmynarz commented 7 years ago

The reported observation comes from the Bonn 2022 dataset, while your query matches the DSD of a different dataset. When you execute a query detecting multiple observations per combination of dimensions values, as defined in the 2022 dataset's DSD, you will find some duplicates:

PREFIX bonn-dimension: <http://data.openbudgets.eu/ontology/dsd/bonn-budget-2017_2024/dimension/>
PREFIX obeu-dimension: <http://data.openbudgets.eu/ontology/dsd/dimension/>
PREFIX qb:             <http://purl.org/linked-data/cube#>

SELECT (GROUP_CONCAT(str(?observation); separator = ", ") AS ?observations)
WHERE {
  ?observation qb:dataSet [
      obeu-dimension:organization ?organization ;
      obeu-dimension:fiscalYear ?fiscalYear ;
      obeu-dimension:budgetPhase ?budgetPhase
    ] ;
    bonn-dimension:operationCharacter ?operationCharacter ;
    bonn-dimension:administrativeClassification ?administrativeClassification ;
    bonn-dimension:functionalClassification ?functionalClassification ;     
    bonn-dimension:economicClassification ?economicClassification .
}
GROUP BY ?organization ?fiscalYear ?budgetPhase ?operationCharacter
         ?administrativeClassification ?functionalClassification
         ?economicClassification
HAVING (COUNT(?observation) > 1)
fathoni commented 7 years ago

Hi Jindrich, Thanks for the query example. There are a few things I do not understand.

while your query matches the DSD of a different dataset

The DSD above is a generic DSD, which is used throughout the whole 2017-2024 dataset. Could you specify missing points on the DSD so that it can also be used for 2022 dataset?

Also in your query, why do some other (sub) dimensions (e.g., profitCenter, accountAssignmentElements, businessArea) do not count to define the uniqueness of an observation?

jindrichmynarz commented 7 years ago

What do you mean that the DSD is a generic one? If datasets have different structures, they must have different DSDs.

In the query I simply followed the DSD and included only the dimensions declared in the DSD. The other properties you mention are not declared in the DSD.

fathoni commented 7 years ago

Aha, okay, will update it accordingly.

fathoni commented 7 years ago

Should be fixed by now.