zazuko / xrm

A friendly language for mappings to RDF
MIT License
1 stars 0 forks source link

Create CSV on the Web JSON output #18

Closed mchlrch closed 4 years ago

mchlrch commented 5 years ago

Include an additional generator into the existing codebase that generates CSV on the Web JSON output.

To start, the generator should filter Mappings on map.source.type.name == 'csv', so CSV on the Web output only includes mappings from CSV sources, even if there are other mappings and sources present in the mapping project. I'm assuming here, that we have source-types { csv referenceFormulation "ql:CSV" } defined in the mapping project. Later on, we will probably extend the DSL and adjust the filter criteria to make this more explicit that string matching.

Samples for the output: https://github.com/zazuko/blv-tierseuchen-ld/tree/master/metadata

I suggest the following approach: 1) First only add a generator, without extending the DSL. Generate as much as we can based on the information that the DSL can currently describe. 2) Identify the gaps and figure out where the additionally necessary information fits best into the DSL. Some of the extensions will probably also be useful for the other output formats. 3) Extend the DSL as necessary 4) Extend the generators

nicky508 commented 5 years ago

Added a new generator for csvw. Implemented basic csvw features which are already present in the current grammer. Probably some outline details could be improved, on the curly brackets specific.

I think it probably is the best way to take the /blv-tierseuchen-ld meta files as a start. From there we have some gaps in the DSL:

For now it works on simple string comparison as you mentioned, it would indeed better to extend this in the future than string comparison. By filling this gap we would be able to completely regenerate the /blv-tierseuchen-ld meta files

ktk commented 5 years ago

Very cool, thanks! More than happy to testdrive this. I agree on the missing features, IMO we should extend the DSL to make this happen or what makes sense in your opinion @mchlrch ?

ktk commented 5 years ago

@nicky508 btw we do have datatype support, see https://github.com/zazuko/rdf-mapping-dsl-user/blob/master/documentation/mapping-language.md#datatypes or am I missing something?

nicky508 commented 5 years ago

@nicky508 btw we do have datatype support, see https://github.com/zazuko/rdf-mapping-dsl-user/blob/master/documentation/mapping-language.md#datatypes or am I missing something?

True, there is a simple datatype version. But the CSVW datatype is very advanced: https://www.w3.org/TR/tabular-data-primer/#new-datatypes. Now that I am reviewing this again. I think I am a bit too enthusiastic on the datatypes. This one is not needed for the examples. So I think the current minimal datatype implementation will do.

mchlrch commented 5 years ago

To extend the DSL, I propose to add the new information in the following places:

  • Dialect specification (delimiter, header etc. )

As optional attribute of LogicalSource: ('dialect' dialect=STRING)?

  • Null property for avoiding errors

As optional attribute of Referenceable (inside LogicalSource)

  • Output suppression

I think suppressOutput for columns can be calculated and doesn't need to be specified explicitly. All the map.source.referenceables that are not used in map.poMappings should get suppressOutput

  • Making columns virtual

Are some uses of virtual equivalent to https://www.w3.org/TR/r2rml/#constant ? We could add ConstantValuedTerm

ValuedTerm:
    ReferenceValuedTerm | TemplateValuedTerm | LinkedResourceTerm;
// TODO: ConstantValuedTerm
ktk commented 5 years ago

@mchlrch yes I had the same idea tonight with suppressOutput, let's do it like this.

nicky508 commented 5 years ago

Great, I have not checked it yet. Working top to bottom. But we will do it like that

nicky508 commented 5 years ago
  • Null property for avoiding errors

As optional attribute of Referenceable (inside LogicalSource)

Issue is that we also have the option of defining a alternative name for a column (a name not allowed by the ID property. Therefore we have this construction: id "id". By adding another string, we get issue by defining the name. The user need to define the "id" name to also set "NULL": id "id" "NULL" This will be allowed but interpreted wrongly:id "NULL"

nicky508 commented 5 years ago

Would it be an idea to add it to the mapping properties:

map AirportMapping from airport {
    subject template "http://airport.example.com/{0}" with id

    types transit.Stop

    properties
    transit.route from stop with datatype xsd.integer nullable "NULL"
    wgs84_pos.lat from latitude
    wgs84_pos.long from longitude
    transit.route from stop with language-tag en
    transit.route template "https://permits.example.org/permitlevels/{0}" with stop;
}
nicky508 commented 5 years ago

Are some uses of virtual equivalent to https://www.w3.org/TR/r2rml/#constant ? We could add ConstantValuedTerm

Some uses are equivalent but not necessarily: https://www.w3.org/TR/2015/REC-tabular-metadata-20151217/#use-of-virtual-columns I see constructions as these in the examples:

{
      "propertyUrl": "schema:location",
      "valueUrl": "#location-{GID}",
      "virtual": true
}

Which are almost normal column statements.

nicky508 commented 5 years ago

I added a optional dialect group for the options in the CSVW standard. For now i skipped the null attribute because of the above issue we first need to discuss. Output suppression is calculated as suggested. All references which do not appear in one of the predicate object maps, are added as suppressed output. Virtual does not necessarily means the value is a constant. According to the standard is seems to be:

Virtual columns are useful when data needs to be added as part of an output transformation that doesn't exist in the source file.

I am not sure, what this exactly means. According to the examples in the standard: https://www.w3.org/TR/2015/REC-tabular-metadata-20151217/ I am not able to find a consequently used pattern, to find out what it is exactly used for. Although I see in the blv-tierseuchen-ld examples, it is used a lot for defining constant as proposed. For now I implemented it in this way. I added the ConstantValuedTerm and create a constant in both the rml mappings and the csvw mapping.

One of the issues I found but was not able to solve yet, is handling booleans. Booleans are not standard part of the parse rules. Now I was able to let the user define a boolean, but when it is true it appears in the resulting mapping and when the boolean is defined false, the object seems to be null and the attribute is not appearing into the mappings (which is a bit strange).

nicky508 commented 4 years ago

@mchlrch Did you had time to review the changes?

mchlrch commented 4 years ago

@nicky508 I didn't get around to reviewing the changes yet. Planned to do it later this week.

mchlrch commented 4 years ago

Generated output is currently not correct, if there are multiple mappings defined in one file, like it is the case for example in https://github.com/zazuko/elcom-strompreis/blob/master/src/tarif_categorien.xrm

Maybe we should always generate the structure as in EXAMPLE 27, even if there is only one mapping in a file.

nicky508 commented 4 years ago

Generated output is currently not correct, if there are multiple mappings defined in one file, like it is the case for example in https://github.com/zazuko/elcom-strompreis/blob/master/src/tarif_categorien.xrm

Maybe we should always generate the structure as in EXAMPLE 27, even if there is only one mapping in a file.

Yes ofcouse, I did not test this case. I will do a modification.

mchlrch commented 4 years ago

@nicky508 wrote

Now I was able to let the user define a boolean, but when it is true it appears in the resulting mapping and when the boolean is defined false, the object seems to be null and the attribute is not appearing into the mappings (which is a bit strange).

There was a warning in the grammar file. I ran the proposed quickfix and now false values also make it into the mapping.

BooleanLiteral:
    value?='true' | {BooleanLiteral} 'false';
mchlrch commented 4 years ago

Merged to master. Closing this and created new issues for the missing parts: https://github.com/zazuko/rdf-mapping-dsl/labels/csv

@nicky508 If I missed something, then please open new issues accordingly