A note or specification for dataset output would be useful

sesuncedu commented 9 years ago

During the course of processing a document it would be helpful to be to generate a few separate graphs :

a graph for the extracted data. This could be the default graph, but it might be better to have a different name, so that different versions of an html file retrieved at different times have a distinct graph name that can be used with
a provenance graph about that graph (and other resources used in processing it)
A graph for any derived properties and classes (owl ontology for derived sdo "/" subproperties and subclasses)

There could easily be conventions for generating and identifying these graphs. Some function of the document url (including all parameters affecting the content), last-modified, access, and processing start time might work (whichever date is available). Each graph could be a frag of this base. The named graph semantics would be Carollinian, where the graph name names the graph

A suitable normative format would be Trig or n-quads.

gkellogg commented 9 years ago

RDFa uses similar notions of graphs, but does not create a dataset. Presumably a Microdata processor could do something similar, and implementations are free to do so. However, requiring every implementation to do this is unrealistic, as the community for Microdata is typically looking for a simple use case.

A hypothetical future version of the spec could allow optional language for doing this, along the lines of the other optional RDFa output graphs. This would need some community support.

iherman commented 9 years ago

I agree that this is a useful feature for the future, so I marked this as 'postponed' and left it open.

w3c / microdata-rdf

A note or specification for dataset output would be useful #20