w3c / microdata-rdf

Specification of the extraction/transformation of Microdata content to RDF
15 stars 9 forks source link

Remove support for the Vocabulary Registry, contextual propertyURI and mulitpleValues. #10

Closed gkellogg closed 9 years ago

gkellogg commented 9 years ago

The Vocabulary Registry has not been kept up to date, and principle consumers of microdata (i.e., Search Engines) ignore this in any case. This issue proposes to remove the Registry, along with support for independently setting propertyURI and multipleValues.

propertyURI is treated like vocabulary if there is an @itemtype, otherwise, like contextual creating a property relative to the document base.

multipleValues is treated like false, where values are never placed in an rdf:List.

iherman commented 9 years ago

It would be good to know whether, indeed, nobody else uses this conversion spec other than schema.org. I am not sure that is exactly the case; I think people may use it as a test conversion for RDF, hoping that schema.org does the same. So... do we have any information on how exactly schema.org converts to RDF? Does it use this spec?

Ivan

On 03 Nov 2014, at 24:14 , Gregg Kellogg notifications@github.com wrote:

The Vocabulary Registry has not been kept up to date, and principle consumers of microdata (i.e., Search Engines) ignore this in any case. This issue proposes to remove the Registry, along with support for independently setting propertyURI and multipleValues.

propertyURI is treated like vocabulary if there is an @itemtype, otherwise, like contextual creating a property relative to the document base.

multipleValues is treated like false, where values are never placed in an rdf:List.

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D WebID: http://www.ivan-herman.net/foaf#me

gkellogg commented 9 years ago

@danbri has said that the like that there is a spec for transforming Microdata to RDF, but don't feel bound by it. In particular, they do nothing with the Registry, and aren't concerned with corner-cases of processing where @itemtype doesn't come from http://schema.org/, which makes sense.

I suggest we simply put in the text a call for review of this direction. I'm not aware of any use of Microdata to RDF outside of the public-vocabs discussions, but something may come out of the wood-work. Note that all vocabularies defined in the current Registry use only the vocabulary form.

The loss of ordered values, is IMO inconsequential at this point.

The current note was intended specifically to inform some future group, which looks like it may never arise, so I think now is the time to clarify these issues and set the direction based on actual use by the community.

gkellogg commented 9 years ago

Because of schema:additionalType, completely getting rid of the registry probably can't happen, but I would remove support for contextual and multipleValues, or at least remove all per-property settings in the published registry accept for schema:additionalType

danbri commented 9 years ago

Can't you just treat it as an ordinary property? On 4 Nov 2014 16:23, "Gregg Kellogg" notifications@github.com wrote:

Because of schema:additionalType, completely getting rid of the registry probably can't happen, but I would remove support for contextual and multipleValues, or at least remove all per-property settings in the published registry accept for schema:additionalType

— Reply to this email directly or view it on GitHub https://github.com/w3c/microdata-rdf/issues/10#issuecomment-61666393.

gkellogg commented 9 years ago

On Nov 4, 2014, at 8:34 AM, danbri notifications@github.com wrote:

Can't you just treat it as an ordinary property?

We added vocabulary-expansion to generate the following:

schema:additionalType rdfs:subPropertyOf rdf:type .

if it is encountered. We also added a small about of entailment (from RDFa), so that if you have, eg, the following:

You get out the following:

[ a schema:Person, foaf:Person; schema:additionalType foaf:Person ]

This was done in the published Note. Of course, we could remove that, but then it wouldn't really do what people expect.

Gregg

On 4 Nov 2014 16:23, "Gregg Kellogg" notifications@github.com wrote:

Because of schema:additionalType, completely getting rid of the registry probably can't happen, but I would remove support for contextual and multipleValues, or at least remove all per-property settings in the published registry accept for schema:additionalType

— Reply to this email directly or view it on GitHub https://github.com/w3c/microdata-rdf/issues/10#issuecomment-61666393.

— Reply to this email directly or view it on GitHub.

danbri commented 9 years ago

On 4 November 2014 17:30, Gregg Kellogg notifications@github.com wrote:

On Nov 4, 2014, at 8:34 AM, danbri notifications@github.com wrote:

Can't you just treat it as an ordinary property?

We added vocabulary-expansion to generate the following:

schema:additionalType rdfs:subPropertyOf rdf:type .

if it is encountered. We also added a small about of entailment (from RDFa), so that if you have, eg, the following:

You get out the following:

[ a schema:Person, foaf:Person; schema:additionalType foaf:Person ]

This was done in the published Note. Of course, we could remove that, but then it wouldn't really do what people expect.

I recommend removing it. I can add the subproperty assertion into the schema.org RDFS somewhere if you like, in which case the inferences should go through.

There is much more to be gained from the RDF mapping being super-simple and easy to understand, than from edge cases like this. Especially if we start pointing non-RDF people at tooling like any23.apache.org as a way to consume schema.org...

gkellogg commented 9 years ago

Given that it's in the RDFS for schema.org, then it can certainly be removed. This would leave generating the rdfa:usesVocabulary triple generated from the value of @itemtype, which would cause the same triples to be created if vocabulary inference is turned on. That elements the need for the registry altogether, presuming there actually is no one depending on the contextual propertyURI generation scheme.

danbri commented 9 years ago

On 4 November 2014 18:20, Gregg Kellogg notifications@github.com wrote:

Given that it's in the RDFS for schema.org, then it can certainly be removed. This would leave generating the rdfa:usesVocabulary triple generated from the value of @itemtype, which would cause the same triples to be created if vocabulary inference is turned on. That elements the need for the registry altogether, presuming there actually is no one depending on the contextual propertyURI generation scheme.

Oh I hate those usesVocabulary triples! If a vocabulary is even slightly well known or dereferencable it isn't hard to figure out when it is being used without every graph laboriously telling you. If we can do away with the registry that would be a great simplification.

My main concern is that we move away from injecting type names into the property's URI, i.e. it is better on a 'Person' description when encountering a 'name' property to make http://schema.org/name than http://schema.org/Person/name. The shorter form corresponds to what you'd see with JSON-LD, RDFa and to the site and vocabulary structure of schema.org. I hope once the docs are clearer on this, any23.apache.org could be updated to match...

Dan

gkellogg commented 9 years ago

I don't know how to get rid of the rdfa:usesVocabulary triples, as that is how RDFa Entailment knows what vocabulary to download. Otherwise, we'd need to define our own process which would attempt to load every vocabulary detected based on @itemtype, load the vocabularies, and run our own expansion rules. It certainly wouldn't be insurmountable, but we previously though it better to leave this in the RDFa spec.

Otherwise, I think your concerns are addressed with the existing text.

Otherwise, if type is not empty, construct vocab by removing everything following the last SOLIDUS U+002F ("/") or NUMBER SIGN U+0023 ("#") from the path component of type.

The name is then appended to this, possibly adding a "#" if vocab doesn't end with "#" or "/", which gets the result we all expect.

Contextual was an odd case that seemed relevant at the time, where there would be an element having @itemtype with properties referencing un-typed items, and we wanted to make sure that the properties would remain distinct, but time has overtaken that, the registry never defined such a vocabulary, and I really don't think anyone has ever depended on that feature (but it would be good to help validate this assumption).

danbri commented 9 years ago

Turning Microdata into full RDFa isn't a goal. All we need are the triples directly encoded in a chunk of markup. After that, people can do whatever they like, but let's not overcomplicate the first step...

On 4 November 2014 19:04, Gregg Kellogg notifications@github.com wrote:

I don't know how to get rid of the rdfa:usesVocabulary triples, as that is how RDFa Entailment knows what vocabulary to download. Otherwise, we'd need to define our own process which would attempt to load every vocabulary detected based on @itemtype, load the vocabularies, and run our own expansion rules. It certainly wouldn't be insurmountable, but we previously though it better to leave this in the RDFa spec.

Otherwise, I think your concerns are addressed with the existing text.

Otherwise, if type is not empty, construct vocab by removing everything following the last SOLIDUS U+002F ("/") or NUMBER SIGN U+0023 ("#") from the path component of type.

The name is then appended to this, possibly adding a "#" if vocab doesn't end with "#" or "/", which gets the result we all expect.

Contextual was an odd case that seemed relevant at the time, where there would be an element having @itemtype with properties referencing un-typed items, and we wanted to make sure that the properties would remain distinct, but time has overtaken that, the registry never defined such a vocabulary, and I really don't think anyone has ever depended on that feature (but it would be good to help validate this assumption).

— Reply to this email directly or view it on GitHub https://github.com/w3c/microdata-rdf/issues/10#issuecomment-61695110.

gkellogg commented 9 years ago

The vocabulary expansion step was added two years ago after additionalType was added after a number of discussions from around July 2012 privately, mostly @danbri @iherman and me. The thought was that the mechanism was really outside Microdata, and there were existing implementations that made this fairly simple in the RDF domain.

I think it's quite useful that parsers consuming schema:additionalType have a way to turn this into rdf:type, but the mechanism is fairly heavyweight. If we remove it all-together, several things (such as the Linter) will likely break. Of course, I could add a specific rule for schema:additionalType which would be very easy. The question is, how to do this in a spec in a way that doesn't favor this application in a way that isn't appropriate for W3C?

Perhaps, if the schema.org blog simply created a note describing such an extension for Microdata implementations from the Microdata to RDF spec, we could simply remove it, and trust that it would get implemented anyway. It would certainly be cleaner not having the rdfa:usesVocabulary added, and to remove the Vocabulary Expansion section entirely. Maybe we can just add a set of "known extensions" inline within the spec, having just a single entry.

Given that it is simply existing behavior, I'd like to hear some more opinions before simply removing it from the spec.

iherman commented 9 years ago

Guys, I am getting concerned.

I do not see why we should remove something from the spec that has been there for a while, is working and, frankly, does not really bother anyone. It is not like having those extras would break any software or something; the essence of RDF is that having some extra triples here and there (we are not talking about tons of additional information here) can safely be ignored if so wished.

The need for this update round was really triggered by the fact that some HTML5+Microdata features were missing from the RDF conversion spec (e.g., proper generation of triples for ). So we have to make an update to add those. Then there is the separate discussion (essentially with Ian) on whether we can/should add the reverse property; if that gets into the microdata spec, then the RDF conversion should also follow. In my opinion we should stop there and not break backward compatibility and let us not start redesigning everything. (Which basically mean that the discussion round with Ian should be brought to an end soon and get this thing done, because the other necessary changes/additions are fairly obvious.)

(Even the top level list: yes, it is superfluous, yes, it is a bit ugly, but it is completely harmless and, after all, let us just keep it there, we will never know whether somebody makes use of that or not. Just because it is 'ugly' should not be a reason to simply remove it.)

Ivan

On 05 Nov 2014, at 02:26 , Gregg Kellogg notifications@github.com wrote:

The vocabulary expansion step was added two years ago after additionalType was added after a number of discussions from around July 2012 privately, mostly @danbri @iherman and me. The thought was that the mechanism was really outside Microdata, and there were existing implementations that made this fairly simple in the RDF domain.

I think it's quite useful that parsers consuming schema:additionalType have a way to turn this into rdf:type, but the mechanism is fairly heavyweight. If we remove it all-together, several things (such as the Linter) will likely break. Of course, I could add a specific rule for schema:additionalType which would be very easy. The question is, how to do this in a spec in a way that doesn't favor this application in a way that isn't appropriate for W3C?

Perhaps, if the schema.org blog simply created a note describing such an extension for Microdata implementations from the Microdata to RDF spec, we could simply remove it, and trust that it would get implemented anyway. It would certainly be cleaner not having the rdfa:usesVocabulary added, and to remove the Vocabulary Expansion section entirely. Maybe we can just add a set of "known extensions" inline within the spec, having just a single entry.

Given that it is simply existing behavior, I'd like to hear some more opinions before simply removing it from the spec.

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

danbri commented 9 years ago

I propose that all fancy-business be left as an optional extra for sophisticated tools, e.g. that you might invoke via a "power user" mode. The default should be as close to what RDFa would do as possible (except for usesVocabulary).

iherman commented 9 years ago

I am fine with this.

Ivan

On 05 Nov 2014, at 17:14 , danbri notifications@github.com wrote:

I propose that all fancy-business be left as an optional extra for sophisticated tools, e.g. that you might invoke via a "power user" mode. The default should be as close to what RDFa would do as possible (except for usesVocabulary).

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

gkellogg commented 9 years ago

So, this might imply that the use of a registry is optional, but that we keep all of the processing rules. Perhaps there is no longer a default registry. All the rules associated with processing the registry become MAY:

Additionally, Vocabulary Entailment goes from MUST to MAY.

If we choose to keep the registry, perhaps we simply keep the entry for schema.org, and it's additionalType property. Having the entry enables proper vocabulary detection in the face of the schema.org extensions mechanism, but this may not be significant anymore.

Unfortunately, this won't help simplify the spec much, but will ease the burden on implementors.

iherman commented 9 years ago

On 05 Nov 2014, at 17:58 , Gregg Kellogg notifications@github.com wrote:

So, this might imply that the use of a registry is optional, but that we keep all of the processing rules. Perhaps there is no longer a default registry. All the rules associated with processing the registry become MAY:

• propertyURI generation other than vocabulary • mutlipleValues as list • Vocabulary Expansion from subPropertyOf and equivalentProperty • rdfa:usesProperty generated if a registry match found Additionally, Vocabulary Entailment goes from MUST to MAY.

If we choose to keep the registry, perhaps we simply keep the entry for schema.org, and it's additionalType property. Having the entry enables proper vocabulary detection in the face of the schema.org extensions mechanism, but this may not be significant anymore.

Actually... I think Dan's idea was that there would be a flag to control the processor whether it does a complete or a simple thing. The difference is important; what I believe Dan wants is that, by default, the output would be simple even in case the implementation can do the more complicated thing, ie, a control option is necessary.

It is unclear to me whether all processors MUST be able to handle the complex features (under the control of the flag) or whether those are really MAY-s.

Ivan

Unfortunately, this won't help simplify the spec much, but will ease the burden on implementors.

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

gkellogg commented 9 years ago

Note comment from last commit; this restores the registry, but requires an option be set to load it. Default behavior is now that of an empty registry. This doesn't provide any simplification, just that the default behavior is now different.

I'm not certain that that was what @danbri was looking for. The alternative would be to make implementing support for this option a MAY, and being more explicit about processing steps being a MAY, but conformance testing then becomes impossible and it really doesn't simplify the spec any more.

iherman commented 9 years ago

I believe this is what he wanted indeed. The flag would also govern the generation of that top level list of resources, as well as the generation of the rdf:type values through the mini-entailment.

Ivan


Ivan Herman Tel:+31 641044153 http://www.ivan-herman.net

(Written on mobile, sorry for brevity and misspellings...)

On 05 Nov 2014, at 23:55, Gregg Kellogg notifications@github.com wrote:

Note comment from last commit; this restores the registry, but requires an option be set to load it. Default behavior is now that of an empty registry. This doesn't provide any simplification, just that the default behavior is now different.

I'm not certain that that was what @danbri was looking for. The alternative would be to make implementing support for this option a MAY, and being more explicit about processing steps being a MAY, but conformance testing then becomes impossible and it really doesn't simplify the spec any more.

— Reply to this email directly or view it on GitHub.

gkellogg commented 9 years ago

I'm all for simply dumping the top-level list of resources entirely, as I certainly don't consider that an advanced feature; in retrospect, I think we were waffling too much on value ordering, but time has proven this to be without merit. Anyway, that is covered in issue #6.

mini-entailment would only be invoked when there is an rdfa:usesVocabulary triple in the output graph, which can only happen if there's a registry, so I don't think it needs to be handled separately. Also, note that the spec already mentions a vocab_expansion); I think this was the intent of the previous Vocabulary Expansion Control of Microdata Processors section, but the algorithm didn't mark this as being optional.

The net is really no change to the algorithm, except that the default registry is only loaded if the registry option is set to true; it could also be set to a URL, to allow a custom registry to be loaded, as we do for testing anyway.

iherman commented 9 years ago

On 06 Nov 2014, at 05:22 , Gregg Kellogg notifications@github.com wrote:

I'm all for simply dumping the top-level list of resources entirely, as I certainly don't consider that an advanced feature; in retrospect, I think we were waffling too much on value ordering, but time has proven this to be without merit. Anyway, that is covered in issue #6.

mini-entailment would only be invoked when there is an rdfa:usesVocabulary triple in the output graph, which can only happen if there's a registry, so I don't think it needs to be handled separately.

Ah! Good. That simplifies things, both on the explanation and the implementation.

Also, note that the spec already mentions a vocab_expansion); I think this was the intent of the previous Vocabulary Expansion Control of Microdata Processors section, but the algorithm didn't mark this as being optional.

The net is really no change to the algorithm, except that the default registry is only loaded if the registry option is set to true; it could also be set to a URL, to allow a custom registry to be loaded, as we do for testing anyway.

O.k. There was some wording in the RDFa1.1 Core on processor control flags; that can be reused to define the flag but making it implementation dependent how this flag gets to the processor

Ivan

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

gkellogg commented 9 years ago

I added a pull-request to schema.org to create schema:additionalType rdfs:subPropertyOf rdf:type .: https://github.com/rvguha/schemaorg/pull/154.

Given that we're making expansion/entailment an "advanced" option, we could alter the text to simply use RDFa expansion. If this were the case, the registry would contain just a single line:

{"http://schema.org/": {}}

This would cause the rdfs:usesVocabulary triple to be emitted, and the RDFa Vocabulary Expansion would add the inferred triples. The current text allows an implementation to use RDFa Vocabulary Expansion, but does not require it. If the entire notion of Microdata vocabulary expansion is considered an "advanced" feature, this would take us one step further to not reading a registry at all.

(Note, I still think it's a bit of a missed opportunity to keep contextual propertyURI generation and sequenced property values when there's no evidence that anyone is using a custom registry except for testing purposes, and there is really no code which would ever invoke it; existing behavior using the default registry could be maintained by removing the registry and counting on the vocab_expansion option to do all the work).

gkellogg commented 9 years ago

Actually, it seems that we do require the registry in order to generate property property URIs for hcard.

{"http://microformats.org/profile/hcard": {}}

We can't really drop using the default repository if we want to be compatible with existing documentation on using the hcard vocabulary with Microdata:

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Corey Mwamba</title>
  </head>
  <body>
    <section itemtype="http://microformats.org/profile/hcard" itemscope>
      <h1 itemprop="name">Corey Mwamba</h1>
      <p itemprop="street-address">56 Nowhere Road</p>
      <p itemprop="locality">Nowhere</p>
      <p itemprop="postal-code">NO1 6QT</p>
      <a href="http://www.coreymwamba.co.uk/" itemprop="url">My web site</a>
    </section>
  </body>
</html>

By having http://microformats.org/profile/hcard in the vocabulary, we will generate http://microformats.org/profile/hcard#name instead of http://microformats.org/profile/name.

(Note, however, that the example on the Microformats Wiki incorrectly uses http://microformats.org/profile/h-card as the @itemtype they also incorrectly show http://microformats.org/profile/v-event when it should be http://microformats.org/profile/hcalendar#vevent).

HCalendar isn't a problem, as the documented @itemtype is http://microformats.org/profile/hcalendar#vevent, which does generate appropriate markup.

It would be worth knowing how Google is handling such input.

iherman commented 9 years ago

Gregg,

I am a bit afraid that we get into a backward incompatible changes.

  1. At the moment, the syntax of the registry is fairly simple:
{
  "URIOFTHEVOCAB" {
    -- all kinds of things
  }
}

However, your proposal of using {"http://schema.org":{}}, and rely on the “official” JSON-LD schema as maintained by schema.org means, I presume, that the microdata processor is also supposed to (try to) dereference the vocabulary URI in the hope of finding some JSON configuration there. At the moment, the registry is at a fixed place; this can be used to either download the full registry or, for most of the implementations I guess, hard code the registry into the local code. This is a significant difference.

  1. To keep to the schema.org example: if I use, say,

curl --header "Accept: application/ld+json, application/json" http://schema.org

what I get is a JSON-LD file containing a single @context following the @context syntax of JSON-LD. This is not the current syntax of the registry for microdata, i.e., we are getting into a different level of backward incompatibility. Because the vocabulary maintainer has the liberty to use a full-blown JSON-LD syntax in the locally stored JSON-LD context file, what this amounts to is that the microdata processor would have to have a full JSON-LD parser at his/her disposal. I think that means that microdata converters become much more complicate than what they are today. (Of course, one could rely on existing parsers, hopefully; but, e.g., the default RDFLib distribution still does not include a JSON-LD parser and serializer, Niklas’ code has to be downloaded separately, which is an extra load on a lambda user.)

I see both of these issues as real problems. I wonder whether we should not keep at what we have right now...

gkellogg commented 9 years ago

No, the syntax is unchanged, but the content of the registry does need to be re-visited. The current registry is the following:

{
  "http://schema.org/": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered",
    "properties": {
      "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"},
      "blogPosts": {"multipleValues": "list"},
      "blogPosts": {"multipleValues": "list"},
      "breadcrumb": {"multipleValues": "list"},
      "byArtist": {"multipleValues": "list"},
      "creator": {"multipleValues": "list"},
      "episode": {"multipleValues": "list"},
      "episodes": {"multipleValues": "list"},
      "event": {"multipleValues": "list"},
      "events": {"multipleValues": "list"},
      "founder": {"multipleValues": "list"},
      "founders": {"multipleValues": "list"},
      "itemListElement": {"multipleValues": "list"},
      "musicGroupMember": {"multipleValues": "list"},
      "performerIn": {"multipleValues": "list"},
      "actor": {"multipleValues": "list"},
      "actors": {"multipleValues": "list"},
      "performer": {"multipleValues": "list"},
      "performers": {"multipleValues": "list"},
      "producer": {"multipleValues": "list"},
      "recipeInstructions": {"multipleValues": "list"},
      "season": {"multipleValues": "list"},
      "seasons": {"multipleValues": "list"},
      "subEvent": {"multipleValues": "list"},
      "subEvents": {"multipleValues": "list"},
      "track": {"multipleValues": "list"},
      "tracks": {"multipleValues": "list"}
    }
  },
  "http://microformats.org/profile/hcard": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered"
  },
  "http://microformats.org/profile/hcalendar#": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered",
    "properties": {
      "categories": {"multipleValues": "list"}
    }
  }
}

According to @danbri, search engines don't honor any "multipleValues" settings that aren't the default, which implies that trying to do something else is pointless, so it could be simplified to the following:

{
  "http://schema.org/": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered",
    "properties": {
      "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"},
    }
  },
  "http://microformats.org/profile/hcard": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered"
  },
  "http://microformats.org/profile/hcalendar#": {
    "propertyURI":    "vocabulary",
    "multipleValues": "unordered",
  }
}

We can also remove settings which are the same as the defaults, to get the following:

{
  "http://schema.org/": {
    "properties": {
      "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"},
    }
  },
  "http://microformats.org/profile/hcard": {
  },
  "http://microformats.org/profile/hcalendar#": {
  }
}

If we were to do the RDFa-based inference step, vs the scaled-down version currently in the Microdata spec, a setting for "additionalType" is not needed, yielding the following:

{
  "http://schema.org/": {
  },
  "http://microformats.org/profile/hcard": {
  },
  "http://microformats.org/profile/hcalendar#": {
  }
}

We can also remove the entry for hcalendar, as it's presence does not affect propertyURI generation, and only ends up creating a usesVocabulary triple, which is not useful as there's no vocabulary to load, which gets us down to the minimal version I suggested:

{
  "http://schema.org/": {},
  "http://microformats.org/profile/hcard": {}
}

The only reason to keep even this, is to allow for generating usesVocabulary for schema.org and to make sure that hcard properties, such as http://microformats.org/profile/hcard#fn get property created. Your processor should work with this example registry for most of the existing tests, except for the presence of the md:item list; I'll update tests later today for the current state of the spec.

gkellogg commented 9 years ago

Also, regarding dereferencing a vocabulary to get the RDFS/OWL description of it, there are some special cases:

(At least, I was unsuccessful loading definitions using content-negotiation for those vocabularies.

I have suggested that the JSON-LD at http://schema.org/ also contain the vocabulary definition, and it may sometime. I automatically created one at https://github.com/ruby-rdf/json-ld/blob/develop/etc/schema.org.jsonld.

Of course, it would be nice if loading vocabulary definitions was universally consistent, but we make do.

iherman commented 9 years ago

Hey Gregg,

On 09 Nov 2014, at 18:54 , Gregg Kellogg notifications@github.com wrote:

No, the syntax is unchanged, but the content of the registry does need to be re-visited. The current registry is the following:

{ "http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}, "blogPosts": {"multipleValues": "list"}, "blogPosts": {"multipleValues": "list"}, "breadcrumb": {"multipleValues": "list"}, "byArtist": {"multipleValues": "list"}, "creator": {"multipleValues": "list"}, "episode": {"multipleValues": "list"}, "episodes": {"multipleValues": "list"}, "event": {"multipleValues": "list"}, "events": {"multipleValues": "list"}, "founder": {"multipleValues": "list"}, "founders": {"multipleValues": "list"}, "itemListElement": {"multipleValues": "list"}, "musicGroupMember": {"multipleValues": "list"}, "performerIn": {"multipleValues": "list"}, "actor": {"multipleValues": "list"}, "actors": {"multipleValues": "list"}, "performer": {"multipleValues": "list"}, "performers": {"multipleValues": "list"}, "producer": {"multipleValues": "list"}, "recipeInstructions": {"multipleValues": "list"}, "season": {"multipleValues": "list"}, "seasons": {"multipleValues": "list"}, "subEvent": {"multipleValues": "list"}, "subEvents": {"multipleValues": "list"}, "track": {"multipleValues": "list"}, "tracks": {"multipleValues": "list"} } }, "http://microformats.org/profile/hcard": { "propertyURI": "vocabulary", "multipleValues": "unordered" }, "http://microformats.org/profile/hcalendar#": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "categories": {"multipleValues": "list"} } } }

O.k. I misunderstood the earlier comments then.

According to @danbri, search engines don't honor any "multipleValues" settings that aren't the default, which implies that trying to do something else is pointless, so it could be simplified to the following:

{ "http://schema.org/": { "propertyURI": "vocabulary", "multipleValues": "unordered", "properties": { "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}, } }, "http://microformats.org/profile/hcard": { "propertyURI": "vocabulary", "multipleValues": "unordered" }, "http://microformats.org/profile/hcalendar#": { "propertyURI": "vocabulary", "multipleValues": "unordered", } }

I am fine with that.

We can also remove settings which are the same as the defaults, to get the following:

{ "http://schema.org/": { "properties": { "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}, } }, "http://microformats.org/profile/hcard": { }, "http://microformats.org/profile/hcalendar#": { } }

Of course, I am fine with that, too!

If we were to do the RDFa-based inference step, vs the scaled-down version currently in the Microdata spec, a setting for "additionalType" is not needed, yielding the following:

{ "http://schema.org/": { }, "http://microformats.org/profile/hcard": { }, "http://microformats.org/profile/hcalendar#": { } }

I would prefer not to do that. Let us remain with the minimal inference step for microdata as we have today, even if it is behind a flag. I would prefer to keep the implementation requirements as minimal as possible.

We can also remove the entry for hcalendar, as it's presence does not affect propertyURI generation, and only ends up creating a usesVocabulary triple, which is not useful as there's no vocabulary to load, which gets us down to the minimal version I suggested:

{ "http://schema.org/": {}, "http://microformats.org/profile/hcard": {} }

Well, I would prefer to put back the additional type for schema

Thanks!

ivan

The only reason to keep even this, is to allow for generating usesVocabulary for schema.org and to make sure that hcard properties, such as http://microformats.org/profile/hcard#fn get property created. Your processor should work with this example registry for most of the existing tests, except for the presence of the md:item list; I'll update tests later today for the current state of the spec.

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

iherman commented 9 years ago

Hey Gregg,

just trying to summarize where we are, because I think we are in agreement. If this is true, we can also close this issue

(I am not 100% sure we agreed on the last item, but it would make sense).

If you agree this is where we are, I guess we can close the issue.

gkellogg commented 9 years ago

I think there's still some contention on this issue that we need to resolve. As I understand it, there are two views on this:

  1. The core Microdata to RDF algorithm is too difficult to get more implementations, and the extra triples added to the output add noise. If we simply output what is directly parsed, consumers can make their own inference (if they so choose) to equate schema:additionalType with rdf:type.
  2. The rules we've adopted allow for use by other consumers who may use schema:additionalType for general RDF markup, and not simply to be consumed by schema.org. Ensuring that the rdf:type triple is emitted is important.

Note that other aspects of this issue, such as removing support for multipleValues being list, and the contextual propertyURI scheme are also part of this issue, and it would be good to consider them separately. They were combined, as it would be a natural consequence of removing registry support, but at this point they need to be treated separately. Given that the default context does not list anything for multipleValues or propertyURI, only the defaults will be used, so maintaining algorithmic complexity for what is essentially dead code seems like it adds unnecessary complexity.

As for the additionalType business, this could be left as is, or we could use more specific rules for generating such triples which can be far simpler than existing entailment. e.g. (after 11.1.5):

If an entry exists in the registry for name in the vocabulary associated with vocab having the key subPropertyOf or equivalentProperty, generate the following triple: subject: subject predicate: the value for subPropertyOf or equivalentProperty as a URI reference object: value as a URI reference

This keeps the desired functionality, removes the need to emit rdfa:usesVocabulary and to perform vocabulary expansion/entailment over the entire graph.

For example, the following input:

<div itemscope itemtype="http://schema.org/Person">
  <link itemprop="additionalType" href="http://xmlns.com/foaf/0.1/Person" />
  <span itemprop="name">Gregg</span>
</div>

would generate:

[ a schema:Person, foaf:Person; schema:additionalType foaf:Person; schema:name "Gregg" ] .

If this is acceptable, I'd suggest that we go back to eliminating contextual and support for RDF Collections, remove Vocabulary Entailment and remove step 7 of the Generate the triples algorithm. Then add the text above. This vastly simplifies an implementation and accomplishes everything we expect. We can either keep or remove the support for hcard from the registry, may as well keep it, as it does no harm and adds no complexity.

iherman commented 9 years ago

Gregg,

This solution could work, with one addition. The proposed solution equates subproperty and equivalent property, but that is not correct. If it is equivalent property, then an extra triple should be generated in case the graph contains either the right or the left side, so to say.

But I am fine with this approach. If we do it this way, it should not be part of the power features, though, just the normal output. I am not even sure we need the power flag any more.

Ivan

On 11 Nov 2014, at 23:14 , Gregg Kellogg notifications@github.com wrote:

I think there's still some contention on this issue that we need to resolve. As I understand it, there are two views on this:

• The core Microdata to RDF algorithm is too difficult to get more implementations, and the extra triples added to the output add noise. If we simply output what is directly parsed, consumers can make their own inference (if they so choose) to equate schema:additionalType with rdf:type.

• The rules we've adopted allow for use by other consumers who may use schema:additionalType for general RDF markup, and not simply to be consumed by schema.org. Ensuring that the rdf:type triple is emitted is important.

Note that other aspects of this issue, such as removing support for multipleValues being list, and the contextual propertyURI scheme are also part of this issue, and it would be good to consider them separately. They were combined, as it would be a natural consequence of removing registry support, but at this point they need to be treated separately. Given that the default context does not list anything for multipleValues or propertyURI, only the defaults will be used, so maintaining algorithmic complexity for what is essentially dead code seems like it adds unnecessary complexity.

As for the additionalType business, this could be left as is, or we could use more specific rules for generating such triples which can be far simpler than existing entailment. e.g. (after 11.1.5):

If an entry exists in the registry for name in the vocabulary associated with vocab having the key subPropertyOf or equivalentProperty, generate the following triple: subject: subject predicate: the value for subPropertyOf or equivalentProperty as a URI reference object: value as a URI reference

This keeps the desired functionality, removes the need to emit rdfa:usesVocabulary and to perform vocabulary expansion/entailment over the entire graph.

For example, the following input:

Gregg

would generate:

[ a schema:Person, foaf:Person; schema:additionalType foaf:Person; schema:name "Gregg" ] .

If this is acceptable, I'd suggest that we go back to eliminating contextual and support for RDF Collections, remove Vocabulary Entailment and remove step 7 of the Generate the triples algorithm. Then add the text above. This vastly simplifies an implementation and accomplishes everything we expect. We can either keep or remove the support for hcard from the registry, may as well keep it, as it does no harm and adds no complexity.

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

iherman commented 9 years ago

One more thing, though. We and, more importantly, the user has to realize that this solution does not handle recursion. Ie, if I have

a subPropertyOf b b subPropertyOf c

in the registry, and we extract "a" from the microdata, this will not generate a triplet figuring "c".

Or does it? Do we do it recursively? But wait! This is exactly what the current document stipulates describes (and nothing more):-)

Ivan

On 12 Nov 2014, at 05:34 , Ivan Herman ivan@w3.org wrote:

Gregg,

This solution could work, with one addition. The proposed solution equates subproperty and equivalent property, but that is not correct. If it is equivalent property, then an extra triple should be generated in case the graph contains either the right or the left side, so to say.

But I am fine with this approach. If we do it this way, it should not be part of the power features, though, just the normal output. I am not even sure we need the power flag any more.

Ivan

On 11 Nov 2014, at 23:14 , Gregg Kellogg notifications@github.com wrote:

I think there's still some contention on this issue that we need to resolve. As I understand it, there are two views on this:

• The core Microdata to RDF algorithm is too difficult to get more implementations, and the extra triples added to the output add noise. If we simply output what is directly parsed, consumers can make their own inference (if they so choose) to equate schema:additionalType with rdf:type.

• The rules we've adopted allow for use by other consumers who may use schema:additionalType for general RDF markup, and not simply to be consumed by schema.org. Ensuring that the rdf:type triple is emitted is important.

Note that other aspects of this issue, such as removing support for multipleValues being list, and the contextual propertyURI scheme are also part of this issue, and it would be good to consider them separately. They were combined, as it would be a natural consequence of removing registry support, but at this point they need to be treated separately. Given that the default context does not list anything for multipleValues or propertyURI, only the defaults will be used, so maintaining algorithmic complexity for what is essentially dead code seems like it adds unnecessary complexity.

As for the additionalType business, this could be left as is, or we could use more specific rules for generating such triples which can be far simpler than existing entailment. e.g. (after 11.1.5):

If an entry exists in the registry for name in the vocabulary associated with vocab having the key subPropertyOf or equivalentProperty, generate the following triple: subject: subject predicate: the value for subPropertyOf or equivalentProperty as a URI reference object: value as a URI reference

This keeps the desired functionality, removes the need to emit rdfa:usesVocabulary and to perform vocabulary expansion/entailment over the entire graph.

For example, the following input:

Gregg

would generate:

[ a schema:Person, foaf:Person; schema:additionalType foaf:Person; schema:name "Gregg" ] .

If this is acceptable, I'd suggest that we go back to eliminating contextual and support for RDF Collections, remove Vocabulary Entailment and remove step 7 of the Generate the triples algorithm. Then add the text above. This vastly simplifies an implementation and accomplishes everything we expect. We can either keep or remove the support for hcard from the registry, may as well keep it, as it does no harm and adds no complexity.

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

gkellogg commented 9 years ago

This solution could work, with one addition. The proposed solution equates subproperty and equivalent property, but that is not correct. If it is equivalent property, then an extra triple should be generated in case the graph contains either the right or the left side, so to say.

Yes, that should be fairly straight-forward to handle, but would could just as easily leave out equivalentProperty; AFAIKR, it was there because we were coming off of RDFa entailment, where we handled it.

The idea isn't to sneak in reasoning, but to just give a simple way to output the rdf:type triple in a way that doesn't favor schema.org. And, given that what the default registry does is fairly set-in-stone, we'll never see an equivalentProperty case, so I'd say just drop it.

I'll update the spec in the next day or so. Still leaves the issues of contextual and lists, though.

iherman commented 9 years ago

On 12 Nov 2014, at 07:18 , Gregg Kellogg notifications@github.com wrote:

This solution could work, with one addition. The proposed solution equates subproperty and equivalent property, but that is not correct. If it is equivalent property, then an extra triple should be generated in case the graph contains either the right or the left side, so to say.

Yes, that should be fairly straight-forward to handle, but would could just as easily leave out equivalentProperty; AFAIKR, it was there because we were coming off of RDFa entailment, where we handled it.

I am not sure that was the reason. The issue of truly equivalent properties came up, as far as I remember, around schema.org discussions. I could see, in future, real equivalent properties popping up around schema (eg, for the IPTC vocabularies), so I think it is more future proof to leave that in.

The idea isn't to sneak in reasoning, but to just give a simple way to output the rdf:type triple in a way that doesn't favor schema.org. And, given that what the default registry does is fairly set-in-stone, we'll never see an equivalentProperty case, so I'd say just drop it.

See above. I do not think it came in because of any reasoning issues.

See also my other comment: we do not do these things recursively, do we?

I'll update the spec in the next day or so. Still leaves the issues of contextual and lists, though.

I am neutral on the subject. I think the whole list idea came in because we wanted to keep close to the JSON mapping in the microdata spec; I am not even sure that mapping is really used anywhere...

Ivan

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

gkellogg commented 9 years ago

No comments received during review period, closing this issue and removing from the spec.