w3c / csvw

Documents produced by the CSV on the Web Working Group
Other
161 stars 57 forks source link

column number in CSVW namespace #880

Open init-dcat-ap-de opened 2 years ago

init-dcat-ap-de commented 2 years ago

Hello,

we are evaluating the vocabulary to express the schema of open csv data in data portals. The documentation is a bit confusing, since it is distributed over multiple pages... but now to my concrete question:

https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#columns says that there is a column number:

number — the position of the column amongst the columns for the associated table, starting from 1.

This attribute is in my opinion very important, since "arrays in JSON-LD do not convey any ordering of the contained elements by default".

Am I missing something?

Thank you in advance Ludger

gkellogg commented 2 years ago

The tabular data model does define a number of terms for describing tables, rows, and columns, including column number, although there is no such attribute defined in the namespace document, as the property is not actually used as part of a transformation from the CSV to either JSON or RDF. It is, however, possible to include the column number as part of a URI Template Property.

JSON-LD array values are not generally ordered, but they can be ordered using an rdf:List if the proper container description is used, or if the value of an @list member. See Sets and Lists in JSON-LD 1.1 (pretty much the same as in JSON-LD 1.0). The columns term in the namespace, does set the container mapping to @list, so the values of the "columns" member of the tabular metadata is properly ordered.

For example, example 15 in the Simple Example section of the Tabular Data Model shows the following metadata used to map the CSV input:

{
  "@type": "Table",
  "url": "http://example.org/tree-ops.csv",
  "tableSchema": {
    "columns": [
      {"titles": [ "GID" ]},
      {"titles": [ "On Street" ]},
      {"titles": [ "Species" ]},
      {"titles": [ "Trim Cycle" ]},
      {"titles": [ "Inventory Date" ]}
    ]
  }
}

The values of columns is ordered, so that each value relates directly to the associated column in the input. (Note, however, that skipColumns can affect the absolute association from the list index to the column in the CSV).