Closed marek-dudas closed 8 years ago
So the actual problem is that operationCharacter and budgetPhase in FDP datasets don't adhere to the constraints on qb:DimensionProperty
, i.e. available for every observation?
operationCharacter and budgetPhase are not attributes in the DCV sense, because they don't qualify measures, so I'd not cast them as instances of qb:AttributeProperty
.
For these cases, we minted obeu:OptionalProperty
derived from the generic qb:ComponentProperty
. This property is meant to be used for component properties that are missing for some observations, but don't adhere to the semantics of qb:AttributeProperty
.
One solution to the problem you raised could thus be to mint instances of obeu:OptionalProperty
in the FDP-specific namespace. These can be related to obeu-dimension:operationCharacter
and obeu-dimension:budgetPhase
by sharing the same rdfs:range
(i.e. obeu:OperationCharacter
and obeu:BudgetPhase
). Alternatively, we can also link them using shared qb:concept
s, which is already used for some component properties (e.g., both obeu-attribute:currency
and obeu-dimension:currency
link to sdmx-concept:currency
).
However, if you use obeu:OptionalProperty
you need to ensure that the values of the remaining dimensions still uniquely identify the observations (i.e. as per IC-12 you cannot have multiple observations sharing the same dimension values). If this requirement cannot be satisfied, perhaps this is in fact an indication of errors in the FDP datasets. In such case, I'd stick to using dimensions to make the errors clear.
I would rather say the actual problem is FDP considering the properties as attributes of measures and OBEU as dimensions characterizing each observation. The dimension vs. attribute difference seems quite philosophical to me.
Using obeu:OptionalProperty
however seems like the compromise I was hoping for. Both discussed properties are defined only in the descriptor, not the CSV table, and are not part of observation's identification, since they have always the same value for all observations in the dataset, so I don't see any problem there.
I will implement it this way for now. The resulting datasets can be, with some effort, transformed with an additional pipeline into the "one dataset per each measure" format in the future if neccessary.
Thanks for the swift response!
The dimension vs. attribute difference seems quite philosophical to me.
The definition of attributes in DCV is quite clear: attributes qualify a measure. What is the definition of attribute in FDP?
Both discussed properties are defined only in the descriptor, not the CSV table, and are not part of observation's identification, since they have always the same value for all observations in the dataset, so I don't see any problem there.
If they are defined in the descriptor, they apply to all observations, right (qb:componentAttachment qb:DataSet
)? If that is the case I see no problem with using the original dimensions obeu-dimension:operationCharacter
and obeu-dimension:budgetPhase
.
They apply to all observations, but there can be several different operation characters or budget phases in one dataset, each related to different measure, and as they are optional, the same dataset could include a measure without budget phase and/or operation character specified. I could create subproperties of the obeu operation character and budget phase dimensions for each measure and include the related measure name in their uri, but I don't think that hiding info in uri is a good practice.
Maybe it would be better to discuss a specific example, as I am not good with explanations.
Here is a part of a json descriptor describing the measures. "Direction" is operation character and "phase" is budget phase. It's from boost-armenia datapackage
"mapping": {
"measures": {
"approved_amount": {
"source": "approved",
"direction": "expenditure",
"phase": "approved",
"currency": "AMD"
},
"adjusted_amount": {
"source": "adjusted",
"direction": "expenditure",
"phase": "adjusted",
"currency": "AMD"
},
"executed_amount": {
"source": "executed",
"direction": "expenditure",
"phase": "executed",
"currency": "AMD"
}
},
(...)
and here are a few lines from the CSV:
year,admin,econ1,econ2,econ3,econ4,func1,func2,func3,program,exp_type,Econ/func,transfer,approved,adjusted,executed
2006,101001 Staff of President of RA,4000 Running expenses,4100 Payment for labor,4110 Salaries and additional payments paid in drams,4111 Salaries and additional payments of employees,01 General public services,"0101 Legislative and executive bodies, public administration, financial and fiscal relations, foreign affairs","010101 Legislative and executive bodies, public administration",,1 Personnel,Function,Excluding transfers,335071800,330723300,330723200
2006,101001 Staff of President of RA,4000 Running expenses,4100 Payment for labor,4110 Salaries and additional payments paid in drams,"4113 Civil, judicial and other public servants remuneration",01 General public services,"0101 Legislative and executive bodies, public administration, financial and fiscal relations, foreign affairs","010101 Legislative and executive bodies, public administration",,1 Personnel,Function,Excluding transfers,10718200,10718200,10712800
2006,101001 Staff of President of RA,4000 Running expenses,4100 Payment for labor,4130 Actual social security payments,4131 Social security payments,01 General public services,"0101 Legislative and executive bodies, public administration, financial and fiscal relations, foreign affairs","010101 Legislative and executive bodies, public administration",,1 Personnel,Function,Excluding transfers,63093400,57201200,57201200
They apply to all observations, but there can be several different operation characters or budget phases in one dataset, each related to different measure.
So they are specific to a measure, like qb:componentAttachment qb:MeasureProperty
? In that case, if you want to translate to DCV this as literally as possible, you would mint multiple measures. However, I don't think close translation matters and seems sub-optimal in this case.
I could create subproperties of the obeu operation character and budget phase dimensions for each measure and include the related measure name in their uri, but I don't think that hiding info in uri is a good practice.
No, this is not a good practice.
In case of the example, all the measure attributes are available for all measures, so there is no problem in translating them to dimensions (or attributes, if that's more appropriate). Should you have measures with different FDP attributes, you'd model those not used for all measures either as obeu:OptionalProperty
or qb:AttributeProperty
depending on their semantics.
I already am minting multiple measures on one observation. I see two solutions without possible future problems: a) creating attributes in FDP namespace attached to measures and using OBEU predefined values with them b) creating separate datasets for each measure Creating separate observations for each measure and modeling budget phase and operation character as dimension would involve possible problem since some observation could then have that dimension value missing, e.g. in such case
"mapping": {
"measures": {
"approved_amount": {
"source": "approved",
"direction": "expenditure",
"phase": "approved",
"currency": "AMD"
},
"adjusted_amount": {
"source": "adjusted",
"direction": "expenditure",
"phase": "adjusted",
"currency": "AMD"
},
"foo_amount": {
"source": "foo",
"currency": "AMD"
},
I would go with a) at this moment.
Why not cast the intersection of FDP measure attributes as instances of either qb:DimensionProperty
or qb:AttributeProperty
(depending on their semantics) and cast the rest as instances of either obeu:OptionalProperty
or qb:AttributeProperty
(depending on their semantics)?
There are only two FDP measure attributes we are dealing with: "direction" and "phase". I am casting them as qb:AttributeProperty. Is that ok?
So the above examples and discussion were only hypothetical?
No, it is purely practical, the first example is a real FDP dataset. It's just that there are only two such problematical FDP "measure attributes" - "direction" and "phase". So we are not looking for a general way of transforming any "FDP measure attribute", just a specific solution for these two.
The problem I have is that they are both optional and attached to measures. Which I think an optional qb:AttributeProperty attached to qb:MeasureProperty solves acceptibly.
OK. I assume you currently map direction
to obeu-dimension:operationCharacter
and phase
to obeu-dimension:budgetPhase
. Correct?
Yes, because originally I incorrectly thought both are (in FDP) attached to the whole dataset. Now I found out they are attached to FDP measures.
I'd use the approach I proposed above in this comment. I think this is preferable to multiple measures, because it is closer to the OpenBudgets.eu data model (no need for obeu-measure:amount
subproperties, avoids reinventing core component properties unless necessary due to DCV's cardinality constraints).
A side note: Is it always a budget phase? We also have obeu-dimension:paymentPhase
to cater for phases in spending data.
On the other hand, obeu-measure:amount
subproperties are closer to FDP data model: there is for example usually some semantics hidden in the name of the measure property (like "EU_amount", "Total_amount"...)
So what you mean is using the "measure dimension" with qb:measureType
approach?
I am currently aiming at "working pipeline producing OBEU compliant dataset not violating any constraints". It could be of course enhanced to create nicer dataset, but it would take time, a lot of time in my case. And since I have the issue discussed here already almost solved the way I proposed, I would stick with it for now. Any other solution would I think mean large-size rebuilding of many parts of the pipeline.
It is always a budget phase, the allowed FDP values directly correspond to obeu:BudgetPhase
instances.
So what you mean is using the "measure dimension" with qb:measureType approach?
No, that would be the multi-measure approach. I suggest using a single measure with separate dimensions and attributes instead of several measures that have particular values of dimensions of attributes baked in (i.e. qb:componentAttachment qb:MeasureProperty
).
I am currently aiming at "working pipeline producing OBEU compliant dataset not violating any constraints".
Both the multi-measure approach and my suggestion are compatible with the OBEU data model.
It could be of course enhanced to create nicer dataset, but it would take time, a lot of time in my case. And since I have the issue discussed here already almost solved the way I proposed, I would stick with it for now. Any other solution would I think mean large-size rebuilding of many parts of the pipeline.
As an outsider, it seems to me that multi-measure approach is more complicated. What do you think is difficult about my suggestion?
So, e.g., three FDP measures would result into three observations with some artificially created dimension specifying which original FDP measure the obeu:amount
corresponds to in that observation?
As the multi-measure approach (this one to be clear) is currently implemented, anything else seems more complicated to me as it would mean changing the pipeline. The multi-measure seems to correspond to FDP model better, and so it seems less complicated to me.
An example (fragment of) output of the result of my "least effort" solution based on (I hope) your suggestions:
<http://data.openbudgets.eu/ontology/dsd/esif2014> a qb:DataStructureDefinition ;
qb:component _:node1aok8fghtx1 , _:node1aok8fghtx2 , _:node1aok8fghtx3 , _:node1aok8fghtx4 , _:node1aok8fghtx5 , <http://data.openbudgets.eu/ontology/dsd/esif2014/component/budgetPhase> , <http://data.openbudgets.eu/ontology/dsd/esif2014/component/operationCharacter> , _:node1aok8fghtx10 , _:node1aok8fghtx6 , _:node1aok8fghtx7 , _:node1aok8fghtx8 , _:node1aok8fghtx9 .
_:node1aok8fghtx3 qb:measure <http://data.openbudgets.eu/ontology/dsd/esif2014/measure/EU_Amount> .
<http://data.openbudgets.eu/ontology/dsd/esif2014/measure/EU_Amount> obeu-attribute:currency <http://data.openbudgets.eu/codelist/currency/EUR> ;
a rdf:Property , qb:MeasureProperty ;
rdfs:subPropertyOf obeu-measure:amount ;
<http://schemas.frictionlessdata.io/fiscal-data-package#budgetPhase> obeu-budgetphase:approved ;
<http://schemas.frictionlessdata.io/fiscal-data-package#operationCharacter> obeu-operation:expenditure .
_:node1aok8fghtx4 qb:measure <http://data.openbudgets.eu/ontology/dsd/esif2014/measure/National_Amount> .
<http://data.openbudgets.eu/ontology/dsd/esif2014/measure/National_Amount> obeu-attribute:currency <http://data.openbudgets.eu/codelist/currency/EUR> ;
a rdf:Property , qb:MeasureProperty ;
rdfs:subPropertyOf obeu-measure:amount ;
<http://schemas.frictionlessdata.io/fiscal-data-package#operationCharacter> obeu-operation:revenue .
_:node1aok8fghtx5 qb:measure <http://data.openbudgets.eu/ontology/dsd/esif2014/measure/Total_Amount> .
<http://data.openbudgets.eu/ontology/dsd/esif2014/measure/Total_Amount> obeu-attribute:currency <http://data.openbudgets.eu/codelist/currency/EUR> ;
a rdf:Property , qb:MeasureProperty ;
rdfs:subPropertyOf obeu-measure:amount ;
<http://schemas.frictionlessdata.io/fiscal-data-package#operationCharacter> obeu-operation:revenue .
<http://data.openbudgets.eu/ontology/dsd/esif2014/component/budgetPhase> qb:attribute <http://schemas.frictionlessdata.io/fiscal-data-package#budgetPhase> ;
qb:componentAttachment qb:MeasureProperty ;
qb:componentRequired false .
<http://data.openbudgets.eu/ontology/dsd/esif2014/component/operationCharacter> qb:attribute <http://schemas.frictionlessdata.io/fiscal-data-package#operationCharacter> ;
qb:componentAttachment qb:MeasureProperty ;
qb:componentRequired false .
<http://schemas.frictionlessdata.io/fiscal-data-package#budgetPhase> a qb:AttributeProperty , rdf:Property ;
rdfs:range obeu:BudgetPhase .
<http://schemas.frictionlessdata.io/fiscal-data-package#operationCharacter> a qb:AttributeProperty , rdf:Property ;
rdfs:range obeu:OperationCharacter .
<http://data.openbudgets.eu/resource/dataset/esif2014/observation/1fa31e2c-faf3-4952-8185-aec2df4505a9> a qb:Observation ;
<http://data.openbudgets.eu/ontology/dsd/esif2014/measure/EU_Amount> 3730936.91 ;
<http://data.openbudgets.eu/ontology/dsd/esif2014/measure/National_Amount> 3618957.31 ;
<http://data.openbudgets.eu/ontology/dsd/esif2014/measure/Total_Amount> 7349894.22 ;
<http://data.openbudgets.eu/ontology/dsd/esif2014/dimension/unknown> "M02" ;
<http://data.openbudgets.eu/ontology/dsd/esif2014/dimension/administrator> <http://data.openbudgets.eu/resource/dataset/esif2014/administrator/AT> ;
<http://data.openbudgets.eu/ontology/dsd/esif2014/dimension/date> <http://reference.data.gov.uk/id/gregorian-year/2014> ;
<http://data.openbudgets.eu/ontology/dsd/esif2014/dimension/functional-classification> <http://data.openbudgets.eu/resource/dataset/esif2014/functional-classification/2014AT06RDNP001-1> ;
<http://data.openbudgets.eu/ontology/dsd/esif2014/dimension/fin-source> "EAFRD" ;
qb:dataSet <http://data.openbudgets.eu/resource/dataset/esif2014> .
On the one hand, having component properties attached to measures is easier to implement, because that's how FDP does things. On the other hand, attaching properties to observations is more natural to DCV, as dimensions on measures don't make much sense. Data modelling decisions should thus consider their implementation cost. If it is costly to produce a representation more in line with DCV, unless it is offset by benefits for the users of the data, then go with the FDP way.
Your last data snippet uses the approach with multiple measures that you proposed, so we probably got this confused. What I propose is:
obeu-measure:amount
)operationCharacter
and budgetPhase
either reused as dimensions (i.e. obeu-dimension:operationCharacter
, obeu-dimension:budgetPhase
) or minted as instances of obeu:OptionalProperty
.We can return to the original question "Could we have operationCharacter and budgetPhase as attributes?" to obtain more clarity. If using these component properties as dimensions requires undue implementation effort, then you can definitely mint new attributes with similar interpretation.
Regarding the operationCharacter and budgetPhase issue itself and its solution at this moment: is what I showed in the snippet above an acceptible solution for now? From my side it is, I have it implemented and it seems to be working and giving valid output.
I would propose creating a separate issue marked as "enhancement" for discussion of the single measure vs. multiple measure approaches. I think I am finally starting to partially understand what you mean. In any case, can we agree it would be nice to have a different approach for dealing with multiple measures, but as it would take some time to implement it and the current approach is acceptible, we will leave it to a possible next version of the pipeline?
Sure, let's discuss this in another issue in case users of the data produced by the FDP2RDF pipeline would find the chosen modelling difficult.
Your example seems to be fine. I'd only suggest using obeu:OptionalProperty
instead of qb:AttributeProperty
for <http://schemas.frictionlessdata.io/fiscal-data-package#budgetPhase>
and <http://schemas.frictionlessdata.io/fiscal-data-package#operationCharacter>
(and similarly qb:componentProperty
instead of qb:attribute
for relating these component properties to their component specifications).
A minor side question: Is there a reason to use snake case, such as EU_Amount
instead of kebab-case (i.e. LCASE(REPLACE("EU_Amount", "_", "-"))
) recommended by the OpenBudgets.eu data model?
Thanks, I'll adjust it to OptionalProperty and keep it that way for now.
The names such as EU_Amount
come directly from the FDP descriptor. I can put name adjustment at the end of the pipeline to the "nice to have features" list.
During testing, I found out that the pipeline does not handle correctly the fact that FDP equivalents of operationCharacter and budgetPhase are properties of FDP measures.
I think @jindrichmynarz is most qualified to answer this.