Closed 6a6d74 closed 9 years ago
If we are going to use dcat:distribution
and dcat:downloadURL
as default properties in the metadata vocabulary they should probably be aliased to distribution
and downloadURL
in our default context.
I must admit this argument is compelling. (Although I am wary to add new things to the metadata and conversion structures...)
Ivan
On 04 Dec 2014, at 14:19 , Jeremy Tandy notifications@github.com wrote:
The metadata vocabulary section 3.5 Tables asserts that the @id property "gives the URL of the CSV file that the table is held in, relative to the location of the metadata document”.
Is this correct? Are the table and the CSV encoding of that table the same thing?
I think not … the CSV is more like a specific distribution, as defined in DCAT, of the abstract tabular data: "a specific available form of a dataset”. In which case does it make sense to express @type Table as a subclass of dcat:Dataset … therefore a table description for the Palo Alto tree data with data encoded in CSV according to RFC4180 might be expressed as:
{ “@id”: “tree-ops”, “@type”: “Table”, “dcat:distribution”: { “@type”: “dcat:Distribution”, “dcat:downloadURL”: “tree-ops.csv", “dcat:mediaType”: “text/csv" } , ... }
or, a minimal form:
{ “@id”: “tree-ops”, “dcat:distribution”: { “dcat:downloadURL”: “tree-ops.csv" }, ... }
This makes things much clearer when mapping the data to RDF. Ever mindful of the question "what's the subject?" it's clear to me that we should be treating the abstract tabular data and its CSV encoded distribution as separate entities.
The parser / mapping engine can easily traverse this structure to locate the CSV encoding.
— Reply to this email directly or view it on GitHub.
Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704
@iherman: my basic problem is that with things as they are, I can't use the @id
property of a table as the subject for statements about the table. See urls-in-data for details on the ambiguity ...
As with @base
, when talking about how values of @id
are expanded, or properties who's value is @type: @id
, we should rely on the interpretation of [[JSON-LD-API]] IRI expansion, rather than re-invent the wheel, and possible make it incompatible.
Agreed. I think that this implies that the [metadata vocab document section 3.5 Tables][1](and other places talking about URI expansion) should be updated to refer to [[JSON-LD-API]] IRI expansion.
This is a good clarification ...
My main question is still open: _shall we treat the _table* and the CSV encoding of that table as separate entities - each with different identifiers?* ... in which the DCAT distribution makes a lot of sense.
On 04 Dec 2014, at 23:25 , Jeremy Tandy notifications@github.com wrote:
Agreed. I think that this implies that the [metadata vocab document section 3.5 Tables]1 should be updated to refer to [[JSON-LD-API]] IRI expansion.
This is a good clarification ...
My main question is still open: shall we treat the table and the CSV encoding of that table as separate entities - each with different identifiers?\ ... in which the DCAT distribution makes a lot of sense.
So... I am a little bit conflicted.
My intellectual and Semantic-Web-infected brain says that yes, this is clearly what we have to do, that makes sense, it is the only proper way, etc, etc.
What I simply cannot judge is how easily will such a terminology and approach be accepted by those who produce these metadata in practice. As we all know, similar discussions on, say, what is the identifier of a person (as opposed to the identifier of the person's home page) has been raging for, well, ages, and the general population out there have essentially voted with their feet in not making a difference. I am just afraid this is where we will end up, too.
I would rely on the experience of guys like you, Dan, or Jeni to tell me what can be expected on this. My experience in the Semantic Web area calls for some caution, as far as I am concerned...
:-(
Ivan
— Reply to this email directly or view it on GitHub.
Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704
I'm hearing that loud and clear. At least if our specification says the right thing and then people ignore this and still point the table identifier to the CSV file (not the abstract tabular data) we're at least indicating that there's a best practice that people should follow!
On 05 Dec 2014, at 09:28 , Jeremy Tandy notifications@github.com wrote:
I'm hearing that loud and clear. At least if our specification says the right thing and then people ignore this and still point the table identifier to the CSV file (not the abstract tabular data) we're at least indicating that there's a best practice that people should follow!
Right. The only point is: we should not make any processing dependent on this. It should be as you say: a best practice.
Ivan
— Reply to this email directly or view it on GitHub.
Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704
I don't think any processing will depend on this - all that will happen is the identifier of the table object that is produced from the mapping will be the URL of the CSV file rather than an abstract tabular data object that can be distributed as that CSV file.
This will only cause problems when people start trying to use that table object in a RDF setting where it's important to have the subject of statements correct.
It's also good to be aligned with DCAT in this way.
The way to manage this without letting HTTPRange-14 worries drown out ease of use, based on URLs in Data, is to define the properties that we're specifying as applying to "the table encoded in the file referenced through the @id
property".
So for example, the csvm:schema
property would be accurately defined as providing the schema of the table encoded in the file referenced through the @id
property. And if we were being really purist we could have separate properties for the link between the file and the table, and directly between the table and the schema.
Having seen the way it is defined in the JSON mapping draft I am actually o.k. with what we have. Do we need anything more?
I propose to leave things as they are for the FPWD review & solicit comment.
Agreed
I.
On 16 Dec 2014, at 16:37 , Jeremy Tandy notifications@github.com wrote:
I propose to leave things as they are for the FPWD review & solicit comment.
— Reply to this email directly or view it on GitHub.
Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704
I'm fine with leaving things the way they are now, but subsequently, I think we need to try to clarify the semantic difference between the metadata description, and the resulting annotated table. Conflating @id
and @type
in the metadata description with that of the annotated table is problematic.
On the telecon we agreed to close this issue and create a new one.
We agreed to replace the use of @id
with a new property url
, which references the CSV or other external resource. @type
remains as is. We don't alias any keywords.
Open are the use of the #table
fragment used in RDF output and the specifics of the DCAT
bit.
The metadata vocabulary section 3.5 Tables asserts that the
@id
property "gives the URL of the CSV file that the table is held in, relative to the location of the metadata document”.Is this correct? Are the table and the CSV encoding of that table the same thing?
I think not … the CSV is more like a specific distribution, as defined in DCAT, of the abstract tabular data: "a specific available form of a dataset”. In which case does it make sense to express @type Table as a subclass of dcat:Dataset … therefore a table description for the Palo Alto tree data with data encoded in CSV according to RFC4180 might be expressed as:
or, a minimal form:
This makes things much clearer when mapping the data to RDF. Ever mindful of the question "what's the subject?" it's clear to me that we should be treating the abstract tabular data and its CSV encoded distribution as separate entities.
The parser / mapping engine can easily traverse this structure to locate the CSV encoding.