Should the RDF/JSON transformation check the values?

w3c / csvw

Documents produced by the CSV on the Web Working Group

Other

163 stars 57 forks source link

Should the RDF/JSON transformation check the values? #62

Closed iherman closed 9 years ago

iherman commented 10 years ago

A high level issue is whether the transformation should check the values for proper content or not. Various situations may arise like invalid URIs generated by a template or an invalid lexical form for a specified datatypes. There seems to be two possibilities

the transformation should check the validity of the data and not issue a k/v pair or, respectively, a triple, in case of an invalid data
the transformation should issue these no matter what, with the knowledge that the generated data might be invalid.

In general, option two is probably a cleaner solution. However, in some cases, the transformation is expected to make transformation (e.g., generating an ISO formatted date value for the datetime and related datatypes) when some errors may be detected...

6a6d74 commented 9 years ago

Discussed in meeting 12-Nov-2014 (Minutes); summary ...

by default, the spec should require the easiest behaviour - which is to allow the mapping function to pass through values as provided
the implication is that downstream processes, e.g. ingest to a triple store, may throw an exception because the RDF produced as the result of mapping contains type-errors
we agreed that 'conformant mode' (as specified in the Rec) passes through literal values, whilst advanced processors may offer additional contextual checking/fixing (via flags) ... e.g. using an 'advanced-processing flag' to set processing behaviour mode
such 'advanced' behaviour is implementation specific and beyond the scope of the specification
it should be possible to test that advanced processors produce consistent output ...
but I think that establishing a test suite for 'advanced processors' is out of scope for version 1

Issue not yet closed as the mapping docs are not yet updated.

gkellogg commented 9 years ago

I think a checker might issue warnings when lexical values do not match that expected, but I think the a processor should emit all the data that it can; this allows downstream tools to make use of it. My own processors typically implement a validate option, which would create an exception if invalid data is encountered, but this defaults to off.

iherman commented 9 years ago

This is closely related to issue #54 (it is, essentially, the same problem!). Just making the link, and add the "metadata vocabulary document" label to this

JeniT commented 9 years ago

I think this is clear now in the metadata document (http://w3c.github.io/csvw/metadata/#parsing-cells). The algorithm adds validation errors to the model, and retains the original values as strings if it can't parse them. That gives maximum flexibility to the converters to either ignore or emit the errors, and to either ignore or emit the string values or invalid values.

iherman commented 9 years ago

I am fine with this.

gkellogg commented 9 years ago

6a6d74 commented 9 years ago

csv2rdf and csv2json docs now explicitly state that the triples or JSON output is not checked. Cell values are parsed upstream of the conversion procedure; errors might be reported & it is up to conversion applications to decide what to do in the case errors are present.