Transformation Errors After First Transformation

zazuko / cube-creator

A tool to create RDF cubes from CSV files

GNU Affero General Public License v3.0

14 stars 2 forks source link

Transformation Errors After First Transformation #1328

Closed tboeni closed 1 year ago

tboeni commented 1 year ago

For a while I experience the issue that when starting a transformation on a project, it will produce an error. Regardless of the change (even if there was none). The error message in each case looked like this:

TransformationError_14112022

What is curious that if a transformation in a project produces this error but then is copied and a new project is created from the copied .trig file and the same .csv is uploaded, the transformation that failed in the old project now succeeds with the changes. For example here 4 versions of the "same" cube: Holzernte Version 1 Holzernte Version 2 Holzernte Version 3 Holzernte Version 4

So for some reason, each transformation after the initial one seems to fail. Although the seems to be no problem, when creating a new project from the .trig files.

tpluscode commented 1 year ago

I had a look at these projects and I notice that, for example in Tobias_Holzernte_m3_version1, all dimensons are marked as Nominal

From the linked documentation

Nominal (named variables)

Most Concepts are of nominal nature. They can be named but not put in a natural order. (E.g. cantons, colors, woods)

Problem arises with those columns which have many unique values. The app attempts to process thousands of values to with the intent to map strings to concepts. This is explains why the first job succeeds. At the point there is no data yet.

Please try to only keep nominal where the dimensions really represent concepts, such as the Kanton column.

l00mi commented 1 year ago

@tboeni can you quickly check if it works with this adaptions? We will still try for a fix in this case, but would priorize it differently.

tboeni commented 1 year ago

I removed the nominal for all the measure dimensions which contain a lot of unique values. This works for the Tobias_Holzernte_m3_version4 cube. However @tpluscode I am not quite sure what this means for the cube if the measure dimensions if the scale of measure is missing

l00mi commented 1 year ago

You can add a scale of measure, it is just not a "nominal" scale in your example, if we understand correctly. Nominal are categories, or types, see https://en.wikipedia.org/wiki/Level_of_measurement#Nominal_level or https://www.questionpro.com/blog/nominal-ordinal-interval-ratio/

tpluscode commented 1 year ago

I think I found a way to mitigate this problem, which may improve overall performance of mapped dimensions overall.

Since the transformation only cares about values which have already been mapped, I modified the query to only get those when processing a dimension. In this particular case, that will mean that the mappings will be empty since the dimensions are not intended for mapping.

Note that this does not yet solve our problem of large mappings in general (as reported in #1288). There is a practical limit of how many values can be mapped in the current implementation