conversion to RDFa - Githubissues

chaals commented 7 years ago

fix #52

WARNING! This may not be complete or accurate.

haven't looked at whether we really need data typing for e.g. date/times. We seem not to need it for URLs...

chaals commented 7 years ago

@halindrome, @msporny, @gkellogg feel free to take a look and tell me I got this wrong... I'm not entirely confident of it yet

gkellogg commented 7 years ago

Step 3 can be simplified, as @type in RDFa can accept multiple values, so it's not necessary to inject span elements.

The @itemref steps are likely adequate, but this really requires implementation to tell for sure. Such an implementation should be straight-forward, and I suggest that it should be able to pass the tests in the microdata to rdf test suite, and note places where they diverge. Basically, transform the microdata to RDFa using your algorithm, turn RDFa into RDF, and then compare with the expected results using RDF graph isomorphism.

gkellogg commented 7 years ago

So, I'm working on an implementation of the algorithm, and it has some significant shortcomings, but can also be somewhat simplified:

Step 1 can be simplified to just remove @itemscope, as @vocab is inherited from an earlier item.

Step 2 can just use the value of @itemtype for @typeof, but retains local type for determining vocab.

For step 6.1, it's not necessary to wrap the referenced element in a new element having @vocab, as that can just be added to the referenced element itself. However, a significant shortcoming is that the vocabulary for two elements referencing the same element may be different, and this is lost, unless the entire reference is duplicated using different @vocab values, but this creates reference issues that probably can't be resolved. (This is likely not a realistic limitation, certainly for schema.org purposes).

6.1 could also be re-written to remove the need to keep track of the vocabulary identifier entirely, making it's retention from step 2 unnecessary, allowing each element with @itemref to be processed without regards to it's current state, if we were to simply look in the ancestry of item for the closest @vocab attribute.

Step 6.2 needs to add rdfa:Pattern to any existing @typeof, if it doesn't already exist, not do it only if there is no @typeof.

The Microdata to RDF processor also performs special processing on object element that an RDFa processor does not do, so native datatype information is lost; this will require a separate step.

Microdata to RDF interprets values of data and meter somewhat differently, only using datatypes integer and double, but RDFa also uses decimal. This could be updated in the Microdata to RDF processor at the loss of some backwards compatibility, or additional processor rules could be added in this algorithm.

I suspect other issues will come up during the course of implementation.

halindrome commented 7 years ago

Not for nothing, but.... what is the intended audience? For example, while vocab can inherit, in my experience I don't want it to unless everything is from one vocab.

On Wed, Jul 19, 2017 at 12:36 PM, Gregg Kellogg notifications@github.com wrote:

So, I'm working on an implementation of the algorithm, and it has some significant shortcomings, but can also be somewhat simplified:

Step 1 can be simplified to just remove @itemscope, as @vocab is inherited from an earlier item.

Step 2 can just use the value of @itemtype for @typeof, but retains local type for determining vocab.

For step 6.1, it's not necessary to wrap the referenced element in a new element having @vocab, as that can just be added to the referenced element itself. However, a significant shortcoming is that the vocabulary for two elements referencing the same element may be different, and this is lost, unless the entire reference is duplicated using different @vocab values, but this creates reference issues that probably can't be resolved. (This is likely not a realistic limitation, certainly for schema.org purposes).

6.1 could also be re-written to remove the need to keep track of the vocabulary identifier entirely, making it's retention from step 2 unnecessary, allowing each element with @itemref to be processed without regards to it's current state, if we were to simply look in the ancestry of item for the closest @vocab attribute.

Step 6.2 needs to add rdfa:Pattern to any existing @typeof, if it doesn't already exist, not do it only if there is no @typeof.

The Microdata to RDF processor also performs special processing on object element that an RDFa processor does not do, so native datatype information is lost; this will require a separate step.

Microdata to RDF interprets values of data and meter somewhat differently, only using datatypes integer and double, but RDFa also uses decimal. This could be updated in the Microdata to RDF processor at the loss of some backwards compatibility, or additional processor rules could be added in this algorithm.

I suspect other issues will come up during the course of implementation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/microdata/pull/73#issuecomment-316493707, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfx8LE6ZY70N2OeSC76sVYnQTru3mddks5sPlrGgaJpZM4OcL_I .

-- Shane McCarron halindrome@gmail.com

danbri commented 7 years ago

Also ping @scor ...

gkellogg commented 7 years ago

@niklasl may want to chime in too.

gkellogg commented 7 years ago

Not for nothing, but.... what is the intended audience? For example, while vocab can inherit, in my experience I don't want it to unless everything is from one vocab.

The algorithm will creat an @vocab for each itemtype. For pure microdata this is sufficient. For mixed RDFa and Microdata, the algorithm could make a hash of it.

gkellogg commented 7 years ago

I have a fairly complete implementation in my rdf-microdata gem. There are some minor issues, but it essentially performs the expected transformations. I made the following changes to the algorithm:

In step 1, simply remove @itemscope and do not add @vocab.

Combining steps 2 and 3, get the values from @itemtype which are absolute URLs and use the first as a basis for creating @vocab. Note that this also uses the registry logic from Microdata to RDF, but should work reasonably well without it.

In step 5, either set @about or @resource depending on if it is a top-level item. Otherwise, make sure there is a @typeof property, with at least an empty string.

In step 6, wrap the referenced element with a div (or other element) containing all @vocab, @resource, and @typeof="rdfa:Pattern. Note that this should check that the referenced element has not already been wrapped with something having the same @resource, and will have problems if @vocab values differ, in addition to other issues noted in https://github.com/w3c/microdata/pull/73#issuecomment-316493707.

From your examples:

locomotive:

<dl typeof="http://md.example.com/loco http://md.example.com/lighting"
    vocab="http://md.example.com/">
  <dt>Name:</dt>
  <dd property="name">Tank Locomotive (DB 80)</dd>
  <dt>Product code:</dt>
  <dd property="product-code">33041</dd>
  <dt>Scale:</dt>
  <dd property="scale">HO</dd>
  <dt>Digital:</dt>
  <dd property="digital">Delta</dd>
</dl>

itemref:

<div vocab="http://schema.org/" resource="#x" typeof="rdfa:Pattern">
  <div id="x">
    <p property="a">1</p>
  </div>
</div>
<div typeof="http://schema.org/Thing" vocab="http://schema.org/">
  <p property="b">test</p>
  <p property="a">2</p>
</div>

chaals commented 7 years ago

@gkellogg, many thanks! I'll work on this some time today.

w3c / microdata

conversion to RDFa #73