w3c / csvw

Documents produced by the CSV on the Web Working Group
Other
163 stars 57 forks source link

Should the RDF/JSON transformation check the values? #62

Closed iherman closed 9 years ago

iherman commented 10 years ago

A high level issue is whether the transformation should check the values for proper content or not. Various situations may arise like invalid URIs generated by a template or an invalid lexical form for a specified datatypes. There seems to be two possibilities

In general, option two is probably a cleaner solution. However, in some cases, the transformation is expected to make transformation (e.g., generating an ISO formatted date value for the datetime and related datatypes) when some errors may be detected...

6a6d74 commented 9 years ago

Discussed in meeting 12-Nov-2014 (Minutes); summary ...

Issue not yet closed as the mapping docs are not yet updated.

gkellogg commented 9 years ago

I think a checker might issue warnings when lexical values do not match that expected, but I think the a processor should emit all the data that it can; this allows downstream tools to make use of it. My own processors typically implement a validate option, which would create an exception if invalid data is encountered, but this defaults to off.

iherman commented 9 years ago

This is closely related to issue #54 (it is, essentially, the same problem!). Just making the link, and add the "metadata vocabulary document" label to this

JeniT commented 9 years ago

I think this is clear now in the metadata document (http://w3c.github.io/csvw/metadata/#parsing-cells). The algorithm adds validation errors to the model, and retains the original values as strings if it can't parse them. That gives maximum flexibility to the converters to either ignore or emit the errors, and to either ignore or emit the string values or invalid values.

iherman commented 9 years ago

I am fine with this.

gkellogg commented 9 years ago

+1

6a6d74 commented 9 years ago

csv2rdf and csv2json docs now explicitly state that the triples or JSON output is not checked. Cell values are parsed upstream of the conversion procedure; errors might be reported & it is up to conversion applications to decide what to do in the case errors are present.