Are the abstract tabular data and the CSV that encodes it the same thing?

6a6d74 commented 9 years ago

The metadata vocabulary section 3.5 Tables asserts that the @id property "gives the URL of the CSV file that the table is held in, relative to the location of the metadata document”.

Is this correct? Are the table and the CSV encoding of that table the same thing?

I think not … the CSV is more like a specific distribution, as defined in DCAT, of the abstract tabular data: "a specific available form of a dataset”. In which case does it make sense to express @type Table as a subclass of dcat:Dataset … therefore a table description for the Palo Alto tree data with data encoded in CSV according to RFC4180 might be expressed as:

{
  “@id”: “tree-ops”,
  “@type”: “Table”,
  “dcat:distribution”: {
    “@type”: “dcat:Distribution”,
    “dcat:downloadURL”: “tree-ops.csv",
    “dcat:mediaType”: “text/csv"
  } ,
  ...
}

or, a minimal form:

{
  “@id”: “tree-ops”,
  “dcat:distribution”: { “dcat:downloadURL”: “tree-ops.csv" }, 
  ...
}

This makes things much clearer when mapping the data to RDF. Ever mindful of the question "what's the subject?" it's clear to me that we should be treating the abstract tabular data and its CSV encoded distribution as separate entities.

The parser / mapping engine can easily traverse this structure to locate the CSV encoding.

6a6d74 commented 9 years ago

If we are going to use dcat:distribution and dcat:downloadURL as default properties in the metadata vocabulary they should probably be aliased to distribution and downloadURL in our default context.

iherman commented 9 years ago

I must admit this argument is compelling. (Although I am wary to add new things to the metadata and conversion structures...)

Ivan

On 04 Dec 2014, at 14:19 , Jeremy Tandy notifications@github.com wrote:

The metadata vocabulary section 3.5 Tables asserts that the @id property "gives the URL of the CSV file that the table is held in, relative to the location of the metadata document”.

Is this correct? Are the table and the CSV encoding of that table the same thing?

I think not … the CSV is more like a specific distribution, as defined in DCAT, of the abstract tabular data: "a specific available form of a dataset”. In which case does it make sense to express @type Table as a subclass of dcat:Dataset … therefore a table description for the Palo Alto tree data with data encoded in CSV according to RFC4180 might be expressed as:

{ “@id”: “tree-ops”, “@type”: “Table”, “dcat:distribution”: { “@type”: “dcat:Distribution”, “dcat:downloadURL”: “tree-ops.csv", “dcat:mediaType”: “text/csv" } , ... }

or, a minimal form:

{ “@id”: “tree-ops”, “dcat:distribution”: { “dcat:downloadURL”: “tree-ops.csv" }, ... }

This makes things much clearer when mapping the data to RDF. Ever mindful of the question "what's the subject?" it's clear to me that we should be treating the abstract tabular data and its CSV encoded distribution as separate entities.

The parser / mapping engine can easily traverse this structure to locate the CSV encoding.

— Reply to this email directly or view it on GitHub.

Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

6a6d74 commented 9 years ago

@iherman: my basic problem is that with things as they are, I can't use the @id property of a table as the subject for statements about the table. See urls-in-data for details on the ambiguity ...

gkellogg commented 9 years ago

As with @base, when talking about how values of @id are expanded, or properties who's value is @type: @id, we should rely on the interpretation of [[JSON-LD-API]] IRI expansion, rather than re-invent the wheel, and possible make it incompatible.

6a6d74 commented 9 years ago

Agreed. I think that this implies that the [metadata vocab document section 3.5 Tables][1](and other places talking about URI expansion) should be updated to refer to [[JSON-LD-API]] IRI expansion.

This is a good clarification ...

My main question is still open: _shall we treat the _table* and the CSV encoding of that table as separate entities - each with different identifiers?* ... in which the DCAT distribution makes a lot of sense.

iherman commented 9 years ago

On 04 Dec 2014, at 23:25 , Jeremy Tandy notifications@github.com wrote:

Agreed. I think that this implies that the [metadata vocab document section 3.5 Tables]1 should be updated to refer to [[JSON-LD-API]] IRI expansion.

This is a good clarification ...

My main question is still open: shall we treat the table and the CSV encoding of that table as separate entities - each with different identifiers?\ ... in which the DCAT distribution makes a lot of sense.

So... I am a little bit conflicted.

My intellectual and Semantic-Web-infected brain says that yes, this is clearly what we have to do, that makes sense, it is the only proper way, etc, etc.

What I simply cannot judge is how easily will such a terminology and approach be accepted by those who produce these metadata in practice. As we all know, similar discussions on, say, what is the identifier of a person (as opposed to the identifier of the person's home page) has been raging for, well, ages, and the general population out there have essentially voted with their feet in not making a difference. I am just afraid this is where we will end up, too.

I would rely on the experience of guys like you, Dan, or Jeni to tell me what can be expected on this. My experience in the Semantic Web area calls for some caution, as far as I am concerned...

:-(

Ivan

— Reply to this email directly or view it on GitHub.

Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

6a6d74 commented 9 years ago

I'm hearing that loud and clear. At least if our specification says the right thing and then people ignore this and still point the table identifier to the CSV file (not the abstract tabular data) we're at least indicating that there's a best practice that people should follow!

iherman commented 9 years ago

On 05 Dec 2014, at 09:28 , Jeremy Tandy notifications@github.com wrote:

I'm hearing that loud and clear. At least if our specification says the right thing and then people ignore this and still point the table identifier to the CSV file (not the abstract tabular data) we're at least indicating that there's a best practice that people should follow!

Right. The only point is: we should not make any processing dependent on this. It should be as you say: a best practice.

Ivan

— Reply to this email directly or view it on GitHub.

Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

6a6d74 commented 9 years ago

I don't think any processing will depend on this - all that will happen is the identifier of the table object that is produced from the mapping will be the URL of the CSV file rather than an abstract tabular data object that can be distributed as that CSV file.

This will only cause problems when people start trying to use that table object in a RDF setting where it's important to have the subject of statements correct.

It's also good to be aligned with DCAT in this way.

JeniT commented 9 years ago

The way to manage this without letting HTTPRange-14 worries drown out ease of use, based on URLs in Data, is to define the properties that we're specifying as applying to "the table encoded in the file referenced through the @id property".

So for example, the csvm:schema property would be accurately defined as providing the schema of the table encoded in the file referenced through the @id property. And if we were being really purist we could have separate properties for the link between the file and the table, and directly between the table and the schema.

iherman commented 9 years ago

Having seen the way it is defined in the JSON mapping draft I am actually o.k. with what we have. Do we need anything more?

6a6d74 commented 9 years ago

I propose to leave things as they are for the FPWD review & solicit comment.

iherman commented 9 years ago

Agreed

I.

On 16 Dec 2014, at 16:37 , Jeremy Tandy notifications@github.com wrote:

I propose to leave things as they are for the FPWD review & solicit comment.

— Reply to this email directly or view it on GitHub.

Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

gkellogg commented 9 years ago

I'm fine with leaving things the way they are now, but subsequently, I think we need to try to clarify the semantic difference between the metadata description, and the resulting annotated table. Conflating @id and @type in the metadata description with that of the annotated table is problematic.

danbri commented 9 years ago

See http://www.w3.org/2015/01/28-csvw-irc#T15-36-56

gkellogg commented 9 years ago

On the telecon we agreed to close this issue and create a new one.

We agreed to replace the use of @id with a new property url, which references the CSV or other external resource. @type remains as is. We don't alias any keywords.

Open are the use of the #table fragment used in RDF output and the specifics of the DCAT bit.

w3c / csvw

Are the abstract tabular data and the CSV that encodes it the same thing? #93