zazuko / xrm

A friendly language for mappings to RDF
MIT License
1 stars 0 forks source link

csvw: declaring null value on table and groups-of-table level #90

Closed mchlrch closed 3 years ago

mchlrch commented 4 years ago

Declaring a NULL-value is currently only supported per column (doc).

It should also be possible to declare the null value on table level and groups-of-table level

In the DSL, that means to support it on source-group and logical-source level

DSL sample on logical-source level:

logical-source EMPLOYEE {
    type csv
    source "EMP"
    null "X"

    referenceables
        EMPNO
        ENAME
}

CSVW Output:

{
    "@context": "http://www.w3.org/ns/csvw",
    "url": "EMP",
    "tableSchema": {
        "null": "X",
        "aboutUrl": "http://data.example.com/employee/{EMPNO}",
        "columns": [ ... ] 
    }
}

It should also be possible to declare (possible different) null value on the different levels, group-of-tables, table and column level. Validation for shadowing is not necessary

CSVW Inherited Properties in the spec: https://www.w3.org/TR/2015/REC-tabular-metadata-20151217/#inherited-properties

mchlrch commented 4 years ago

I'm not completely sure how the csvw JSON output should look like for declaring the null value on table level and groups-of-table level. I found this: CSVW Inherited Properties in the spec: https://www.w3.org/TR/2015/REC-tabular-metadata-20151217/#inherited-properties

... but I'm still not sure how it should look like. @ktk @bergos Can you review the following?

single table case

For a single table, we have the following working.

XRM source declaration:

logical-source fooSourceWithNullValue {
    type csv
    source "http://www.example.com/Foo.csv"
    null "nullValue@LogicalSource"

    referenceables
        id
        foo1
        foo2 null "nullValue@referenceable"
}

Generated CSVW JSON:

{
    "@context": "http://www.w3.org/ns/csvw",
    "url": "http://www.example.com/Foo.csv",
    "tableSchema": {
        "aboutUrl": "http://airport.example.com/{id}",
        "null": "nullValue@LogicalSource",
        "columns": [
            {
                "propertyUrl": "http://foobar.com/things/thing/color",
                "titles": "foo1"
            },
            {
                "propertyUrl": "http://foobar.com/things/thing/color",
                "titles": "foo2",
                "null": "nullValue@referenceable"
            }
            ,
            {
                "suppressOutput": true,
                "titles": "id"
            }
        ] 
    }
}

group of tables case

We generate a csvw "group of tables" JSON if an .xrm file contains multiple maps (The group of tables in the csvw could be made up of tables from multiple XRM source-groups)

XRM:

output csvw

map FooMapping from CsvSourceGroupWithNullValueCascade.fooSourceWithNullValue {
    subject template "http://airport.example.com/{0}" with id;

    properties
        thing.color from foo1; // nullValue from logical-source
        thing.color from foo2; // nullValue from referenceable
}

map BarMapping from CsvSourceGroupWithNullValueCascade.barSourceWithoutNullValue {
    subject template "http://airport.example.com/{0}" with id;

    properties
        thing.color from bar1; // nullValue from sourceGroup
        thing.color from bar2; // nullValue from referenceable
}

source-group CsvSourceGroupWithNullValueCascade {
    type csv
    null "nullValue@sourceGroup"

    logical-source fooSourceWithNullValue {
        source "http://www.example.com/Foo.csv"
        null "nullValue@LogicalSource"

        referenceables
            id
            foo1
            foo2 null "nullValue@referenceable"
    }

    logical-source barSourceWithoutNullValue {
        source "http://www.example.com/Bar.csv"

        referenceables
            id
            bar1
            bar2 null "nullValue@referenceable"
    }
}

Generated CSVW JSON:

{
    "@context": "http://www.w3.org/ns/csvw",
    "tables": [
        {
            "url": "http://www.example.com/Foo.csv",
            "tableSchema": {
                "aboutUrl": "http://airport.example.com/{id}",
                "null": "nullValue@LogicalSource",
                "columns": [
                    {
                        "propertyUrl": "http://foobar.com/things/thing/color",
                        "titles": "foo1"
                    },
                    {
                        "propertyUrl": "http://foobar.com/things/thing/color",
                        "titles": "foo2",
                        "null": "nullValue@referenceable"
                    }
                    ,
                    {
                        "suppressOutput": true,
                        "titles": "id"
                    }
                ] 
            }
        },

        {
            "url": "http://www.example.com/Bar.csv",
            "tableSchema": {
                "aboutUrl": "http://airport.example.com/{id}",
                "null": "nullValue@sourceGroup",
                "columns": [
                    {
                        "propertyUrl": "http://foobar.com/things/thing/color",
                        "titles": "bar1"
                    },
                    {
                        "propertyUrl": "http://foobar.com/things/thing/color",
                        "titles": "bar2",
                        "null": "nullValue@referenceable"
                    }
                    ,
                    {
                        "suppressOutput": true,
                        "titles": "id"
                    }
                ] 
            }
        }
    ]
}
mchlrch commented 4 years ago

Found example at W3C with null value at tableSchema level. So what we generate ATM should be OK

https://github.com/w3c/csvw/blob/b9b0ed362ba7491a917626878fc75fc437fa7bba/experiments/historical-weather-observation-dataset/rdf-data-cube-multi-measure-approach/source/cambornedata.csv-metadata.json#L141

{
  "@context": ["http://www.w3.org/ns/csvw", {"@language": "en"}],
  "@id": "http://example.org/cambornedata",
  "url": "cambornedata.csv",
...
  "tableSchema": {
...
    "null": "---"
  }
}
mchlrch commented 3 years ago

Updated documentation with https://github.com/zazuko/expressive-rdf-mapper/commit/f99ff7bc8226fed4ec66d91f30d80b2e7df31284