openbudgets / pipeline-fragments

Reusable fragments of LinkedPipes ETL pipelines
2 stars 3 forks source link

FDP2RDF Dates in year-only format not transformed properly #8

Closed marek-dudas closed 8 years ago

marek-dudas commented 8 years ago

As mentioned in https://github.com/openbudgets/platform/issues/5#issuecomment-232456717, seems like dates in year-only format are not transformed properly. Although the pipeline is designed to support mainly xs:dateTime format, the yyyy values should also get transformed.

@marek-dudas FDP2RDF v0.3.1 run with success at the Fraunhofer server. Its RDF output is also pushed to Fuseki triple-store. The test datapackage.jsonld is https://github.com/openbudgets/pipeline-fragments/blob/master/FDPtoRDF/test/test1/datapackage.jsonld The multiple measures are represented correctly. I fail to find the date value "2014", which should be the value of _:b11 _b:12, _b:13, _b:14 only appear as 'object', not clear to what they should refer.

marek-dudas commented 8 years ago

The issue here is that the date column in that particular CSV is called " date" with an extra space as the first character. Tabular (CSV to RDF component) seems to trim the extra space, but the SPARQL queries do not know that and search for " date" while it is called "date". Two solutions: leave it as is and consider such extra spaces as errors in the input data or change SPARQL in FDPtoRDF pipeline to do the same trimming as Tabular (and possibly introduce new unexpected bugs).

pwalsh commented 8 years ago

trimming whitespace in a column header seems desirable, and harmless, to me. Also consider that a header could be "Date Paid": does the pipeline handle this case?

marek-dudas commented 8 years ago

Ok, will do that. Spaces inside the header are handled correctly.

HimmelStein commented 8 years ago

@marek-dudas have you changed the structure of the pipeline, while fixing the bugs (or after which version, the structure remained stable)? for example, new nodes were added/removed, connection-relations among nodes were updated.. I will continue testing the updated versions.

marek-dudas commented 8 years ago

Bigger structure changes were in commit https://github.com/openbudgets/pipeline-fragments/commit/001e130241d666a663daf5dcf1fa9de2eb3e1a29 and the last one https://github.com/openbudgets/pipeline-fragments/commit/f1a5e38133e63031061833d437cffcac513cf800. If you are asking whether an update could introduce new bugs, then I'm afraid the answer is "each of the updates up to now could have introduced new bugs". I am now planning to implement rest of the changes discussed before pipeline deployment and then do a big amount of testing myself.

marek-dudas commented 8 years ago

The whitespace/trimming issue should be fixed now.