openbudgets / pipeline-fragments

Reusable fragments of LinkedPipes ETL pipelines
2 stars 3 forks source link

OBEU datamodel validation seems to report a mere reuse of obeu measures/attributes #3

Closed marek-dudas closed 8 years ago

marek-dudas commented 8 years ago

I get the following report:

Term defined in the core OpenBudgets.eu namespace. Violation value: http://data.openbudgets.eu/ontology/dsd/attribute/currency Term defined in the core OpenBudgets.eu namespace. Violation value: http://data.openbudgets.eu/ontology/dsd/measure/amount

when I believe I am just reusing the properties (not re-defining). Might still be an error on my side. Advices welcomed.

jindrichmynarz commented 8 years ago

Can you share a pipeline to replicate this issue?

marek-dudas commented 8 years ago

The pipeline is here http://obeu.vse.cz:9080/#/pipelines/edit/canvas?pipeline=http:%2F%2Fobeu.vse.cz:9080%2Fresources%2Fpipelines%2Fcreated-1463680597060 It loads results from http://obeu.vse.cz:9080/#/pipelines/edit/canvas?pipeline=http:%2F%2Fobeu.vse.cz:9080%2Fresources%2Fpipelines%2Fcreated-1463671976909 which are stored in /tmp - rerun of the latter might be needed in case the results were deleted.

jindrichmynarz commented 8 years ago

The validated data is not in the DCV normal form, which is required for validation. Please use the DCV normalization pipeline fragment and report back if the issue persists.

Based on your feedback I improved the validation pipeline, so that it now reports an error, if it is provided with a data not conforming to the DCV normal form.

marek-dudas commented 8 years ago

Tested again with running the normalization first. I still got "Term defined in the core OpenBudgets.eu namespace." on http://data.openbudgets.eu/ontology/dsd/attribute/currency, http://data.openbudgets.eu/ontology/dsd/dimension/operationCharacter and http://data.openbudgets.eu/ontology/dsd/measure/amount. The pipeline is here http://obeu.vse.cz:9080/#/pipelines/edit/canvas?pipeline=http:%2F%2Fobeu.vse.cz:9080%2Fresources%2Fpipelines%2Fcreated-1464531217666 . (The results incude also one additional error, which is probably correctly reported mistake in my dataset.) Maybe it is me doing something wrong...

jindrichmynarz commented 8 years ago

The hijacked namespace detection was not ideal. I made it better in 356108b0a6c6239e2e5151f6a416e4025a877ae4. If you use this version, then the validation does not report any hijacks of the OpenBudgets.eu code data model namespace for your pipeline.

However, it is the case that your input duplicates some of the elements of the OpenBudgets.eu data model. While doing so is OK, it might indicate another problem.

marek-dudas commented 8 years ago

I tried the new version in this pipeline and it runs without errors, but the Mustache generated report (ftp://obeu.vse.cz:2221/540f983a-5b9a-4dc0-ade4-e38759c43081/036/obeu_validation_report.html) still mentions the violations.

For example, obeu-dimension:operationCharacter is used only in the following way:

_:node1ajufp029x9 qb:componentAttachment qb:DataSet ;
    qb:dimension obeu-dimension:operationCharacter .

<http://test.openbudgets.eu/datasets/al-treasury-spending-amount> obeu-attribute:currency <http://dbpedia.org/resource/ALL> ;
    a qb:DataSet ;
    obeu-dimension:operationCharacter obeu-operation:expenditure ; (...)

and is reported.

So should I create a subProperty of it instead even in such cases? No problem for me, just thought that these are situations when subproperting is not needed.

jindrichmynarz commented 8 years ago

No, reusing a core component property is fine. The problem was in outdated version of the OpenBudgets.eu data model stored in the graph <http://data.openbudgets.eu/ontology>. To fix this I ran the pipeline to load the data model. Then I re-run your validation pipeline and it produced empty validation report, indicating the that data should be fine.

marek-dudas commented 8 years ago

Yep, it's allright now.