zazuko / cube-creator

A tool to create RDF cubes from CSV files
GNU Affero General Public License v3.0
11 stars 2 forks source link

Empty row in Cube Creator returns "null" value in Visualize #1462

Open tboeni opened 10 months ago

tboeni commented 10 months ago

This issue is brought over from the Visualize Tool Github:

When displaying the Quality of Bathing Water Cube on INT the column Upper LOD is supposed to be empty. This happens both in preview mode as in the table view I gave it the same property as the Lower LOD column in Cube Creator which is "integer" (Link) However, Visualize displays the Upper LOD not as an empty column but with the "null" value. This is an issue in terms of consitency and also because null could be interpreted as "zero" in German. LOD null

IXT did some investigation on their end and they came to the following conclusion:

Hi @tboeni, I investigated the issue and it seems there are some issues with the data:

  • the Upper LOD dimension is not marked as a numerical dimension, and we are treating it as a "namedNodeDimension". This means that we create a mapping of value – label to get the labels, and another problem appears here: the missing values are not marked with https://cube.link/Undefined value, but rather are simply empty (null). This means that we are omitting them from such mapping and simply convert actual value to string, which results in a "null" visible in the table. I assume there is a way to mark such values with https://cube.link/Undefined in the Cube Creator, because we have such logic to handle empty values since long time,
  • the Lower LOD dimension also misses the https://cube.link/Undefined values, but it's marked as numerical dimension – in this case again we check is the value is equal to https://cube.link/Undefined and if it's not (and this is not the case, as it's null), we convert it to an empty string.

It seems that handling of undefined values is described in the: Cube Schema documentation, so I would advise to try to adjust the dimensions in a recommended way (change the Upper LOD data type to numerical and replace nulls with https://cube.link/Undefined) and then the values should be displayed correctly in Visualize.

Technically, we also could adjust the logic to handle nulls together with https://cube.link/Undefined, but I think it makes more sense to follow the guidelines here.

Is there a possibility that completely empty columns in Cube Creator are handled differently than columns just having few empty values? Would it be a solution to just simply mark the values in the empty row of "Upper LOD" as cube:undefined?

tpluscode commented 10 months ago

Is there a possibility that completely empty columns in Cube Creator are handled differently than columns just having few empty values?

Yes, that actually what is happening, inadvertently. See, the difference in how sh:datatype is encoded for both dimensions

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix qudt: <http://qudt.org/schema/qudt/> .
@prefix unit: <http://qudt.org/vocab/unit/> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .

<https://environment.ld.admin.ch/foen/ubd01041/11/shape/> a sh:NodeShape, <https://cube.link/Constraint> ;
    sh:closed true ;
    sh:property [
        schema:name "Lower LOD"@it, "Lower LOD"@en, "Lower LOD"@fr, "Lower LOD"@de ;
        sh:path <https://environment.ld.admin.ch/foen/ubd01041/lowerlod> ;
        sh:nodeKind sh:Literal ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
        sh:minInclusive 1 ;
        sh:maxInclusive 150 ;
        qudt:scaleType qudt:RatioScale ;
        sh:or (
            [
                sh:datatype <https://cube.link/Undefined> ;
            ]
            [
                sh:datatype xsd:integer ;
            ]
        ) ;
    ], [
        schema:name "Upper LOD"@fr, "Upper LOD"@en, "Upper LOD"@it, "Upper LOD"@de ;
        sh:path <https://environment.ld.admin.ch/foen/ubd01041/upperlod> ;
        sh:in (
            ""^^<https://cube.link/Undefined>
        ) ;
        sh:nodeKind sh:Literal ;
        sh:datatype <https://cube.link/Undefined> ;
        sh:minCount 1 ;
        sh:maxCount 1 ;
        qudt:scaleType qudt:RatioScale ;
    ] .

Because there are no values of Upper LOD, Cube Creator does not produce the sh:or alternative. Right now the datatype is only sourced from final data so if there are no values, only cube:Undefined remains.

I would need to check again at which stage that is done but technically the CSV Mapping already defines the datatype so it should be possible to apply that to a column without any values.