w3c / csvw

Documents produced by the CSV on the Web Working Group
Other
162 stars 57 forks source link

List of datatypes in the conversion document #167

Closed iherman closed 9 years ago

iherman commented 9 years ago

At the moment, e.g., the RDF conversion lists some date related datatypes: date, time, datetime, dateTime and duration. The list is incomplete: it should also include gYearMonth, gYear, gMonthDay;, gDay, gMonth.

The (editorial) question is whether these should simply be listed, or whether some of these datatypes (numeric, date, etc) should be defined at one place for categorization and then we could refer to those to ensure consistency.

gkellogg commented 9 years ago

I think that datatype mapping, and conversion, should be done in metadata, rather than separately in each transformation document. On the call yesterday, the feeling was that formats for dates and times would make use of some standard mechanisms and not rely on picture strings (at least for the moment) (see issue #54). So, it could simply be "iso8601". This allows recognition of most (all?) of the datatypes defined in both the metadata document and the namespace.

Really, all of the metadata and cell processing steps should be left out of the transformation docs, IMO.

I found in my JSON transformation, that I could use just the string values of the literals created for the RDF transformation, which made JSON quite simple.

iherman commented 9 years ago

On 15 Jan 2015, at 19:19, Gregg Kellogg notifications@github.com wrote:

I think that datatype mapping, and conversion, should be done in metadata, rather than separately in each transformation document. On the call yesterday, the feeling was that formats for dates and times would make use of some standard mechanisms and not rely on picture strings (at least for the moment) (see issue #54). So, it could simply be "iso8601". This allows recognition of most (all?) of the datatypes defined in both the metadata document and the namespace.

I must admit I do not understand what you mean.

Really, all of the metadata and cell processing steps should be left out of the transformation docs, IMO.

I agree that the transformation document should be simple, but this is just an editorial issue, isn't it?

I found in my JSON transformation, that I could use just the string values of the literals created for the RDF transformation, which made JSON quite simple.

Yes, I have arrived to the same: it is the transformation of the values that is the real work, regardless of the target format.

Ivan

— Reply to this email directly or view it on GitHub.

gkellogg commented 9 years ago

I think that datatype mapping, and conversion, should be done in metadata, rather than separately in each transformation document. On the call yesterday, the feeling was that formats for dates and times would make use of some standard mechanisms and not rely on picture strings (at least for the moment) (see issue #54). So, it could simply be "iso8601". This allows recognition of most (all?) of the datatypes defined in both the metadata document and the namespace.

I must admit I do not understand what you mean.

(It doesn't look like the minutes were generated, is it to late to do this?)

This was from IRC:

…rather than try parse all datetime formats have list of popular ones, e.g. lists from excel, google spreadsheets etc. [07:30am] danbri1: gkellogg: I think that is more likely to get good impl [07:30am] danbri1: … perhaps ns doc could contain registry of these formats? [07:31am] danbri1: … so we could update that without updating the spec? [07:31am] danbri1: jenit: is*ue there is impl conformance. … how often would it need to check the registry? [07:31am] danbri1: jtandy: are we anticipating then, that validating + parsing software would try to detect which of the blessed formats it used? [07:32am] JeniT: “datatype”: “date”, “format”: “ISO” [07:32am] danbri1: jenit: you'd have something like this [above], … col is a datetype date, format is ISO [07:32am] JeniT: YYYY-MM-DD [07:32am] danbri1: (iso8601?) [07:32am] danbri1: jenit: there would be a builtin list within each impl for such formats, we could name them after the unicode picture string [07:33am] danbri1: maybe later we could move towards fully supporting [07:33am] danbri1: jtandy: we give a particular list of pic strings, … beyond those you might be stuck [07:33am] danbri1: jenit: or fall back to regex [07:33am] danbri1: jtandy: seems workable [07:33am] danbri1: gkellogg: agree, best way fwd [07:33am] danbri1: +1 [07:34am] danbri1: jenit: maybe not enough of us to make a formal resolution but i'll update issues

Basically, this could limit format to being something like iso8601 which would use that spec for parsing date/time values, presumably conforming to the specified datatype, although you could imagine something that inferred this from the specific format, similar to the way we do for @datetime in RDFa and Microdata.

I agree that the transformation document should be simple, but this is just an editorial issue, isn't it?

Yes, and I'll try to update the Metadata document to ensure that everything necessary is included, although some of this may belong in the Syntax/Model document. I can create an alternate version of the CSV2RDF/JSON document(s) on a branch that shows how they might change as a result.

iherman commented 9 years ago

On 15 Jan 2015, at 21:42 , Gregg Kellogg notifications@github.com wrote:

I think that datatype mapping, and conversion, should be done in metadata, rather than separately in each transformation document. On the call yesterday, the feeling was that formats for dates and times would make use of some standard mechanisms and not rely on picture strings (at least for the moment) (see issue #54). So, it could simply be "iso8601". This allows recognition of most (all?) of the datatypes defined in both the metadata document and the namespace.

I must admit I do not understand what you mean.

(It doesn't look like the minutes were generated, is it to late to do this?)

Done now: http://www.w3.org/2015/01/14-csvw-minutes.html

This was from IRC:

…rather than try parse all datetime formats have list of popular ones, e.g. lists from excel, google spreadsheets etc. [07:30am] danbri1: gkellogg: I think that is more likely to get good impl [07:30am] danbri1: … perhaps ns doc could contain registry of these formats? [07:31am] danbri1: … so we could update that without updating the spec? [07:31am] danbri1: jenit: is*ue there is impl conformance. … how often would it need to check the registry? [07:31am] danbri1: jtandy: are we anticipating then, that validating + parsing software would try to detect which of the blessed formats it used? [07:32am] JeniT: “datatype”: “date”, “format”: “ISO” [07:32am] danbri1: jenit: you'd have something like this [above], … col is a datetype date, format is ISO [07:32am] JeniT: YYYY-MM-DD [07:32am] danbri1: (iso8601?) [07:32am] danbri1: jenit: there would be a builtin list within each impl for such formats, we could name them after the unicode picture string [07:33am] danbri1: maybe later we could move towards fully supporting [07:33am] danbri1: jtandy: we give a particular list of pic strings, … beyond those you might be stuck [07:33am] danbri1: jenit: or fall back to regex [07:33am] danbri1: jtandy: seems workable [07:33am] danbri1: gkellogg: agree, best way fwd [07:33am] danbri1: +1 [07:34am] danbri1: jenit: maybe not enough of us to make a formal resolution but i'll update issues

Basically, this could limit format to being something like iso8601 which would use that spec for parsing date/time values, presumably conforming to the specified datatype, although you could imagine something that inferred this from the specific format, similar to the way we do for @datetime in RDFa and Microdata.

Hm. Let me see if I understand, because even with the minutes it is not clear. Does it mean that

That simplifies the converter software and its specification, but I am not sure how it will fly in practice. Unless we push everything into the data author's courtyard, somebody will have to do the transformation somewhere (either the checker or the converter) and then we get back to the picture string issue in the metadata. Ie, I am still unsure where we go here. We are also getting to the point of a separate processor category: we have a checker, we have a JSON/RDF generator and we have a data transformer who transforms the data to the needs of the metadata? I do not see that working... But I may still not understand where we are going here.

(B.t.w., the comparison with RDFa may not be good. In the case of, say, RDFa, it is possible to add the ISO formatted value to @context, while still keeping the human facing part readable for humans. And this is done by the same person who authors the content and the metadata. In the CSV case we have the metadata author who is supposed to be a technically (more) savy person, but the persons producing the data may not be forced to produce an ISO8601 version...

Ivan

I agree that the transformation document should be simple, but this is just an editorial issue, isn't it?

Yes, and I'll try to update the Metadata document to ensure that everything necessary is included, although some of this may belong in the Syntax/Model document. I can create an alternate version of the CSV2RDF/JSON document(s) on a branch that shows how they might change as a result.

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

JeniT commented 9 years ago

@iherman can this issue be closed now, given the content of http://w3c.github.io/csvw/metadata/#formats-for-dates-and-times ?