w3c / microdata

Moved to https://html.spec.whatwg.org/multipage/microdata.html
15 stars 19 forks source link

How do property names inherit the URL base of the itemtype? #41

Closed chaals closed 7 years ago

chaals commented 7 years ago

If you specify an itemtype, then there is an idea that its specification describes "relevant types", and so you don't need to use a full URL to parse them as part of the same specification.

What defines this? Does

<div itemscope itemtype="http://schema.org/Thing">
  <p itemprop="name">My thing</name>
</div>

mean that the Thing has a schema.org name? That is my understanding of what happens in reality, but as far as I can understand the specification, if that code is at http://example.org/some/page" the property should be `http://example.org/some/pagename".

Which means that the interpretation of the property as a schema name is happening by some undocumented magic - parsing according to the "Microdata to RDF" note, or just by deciding that this is how to parse schema.org typed items because that makes sense.

/@danbri @iherman @gkellogg

chaals commented 7 years ago

I think the answer is that the spec should say a typed item whose value is not an absolute URL causes property names to be parsed relative to the URL of the itemtype

This seems like an important sub-issue of #34 - or did I miss something (again)?

gkellogg commented 7 years ago

The mechanism used in Microdata to RDF is described in Generate Predicate URI.

The current vocabulary is taken from the closest itemtype. Looking at it afresh, the language is a bit inconsistent. But, in Generate the Triples step 7 shows how to construct vocab if there is no registry entry:

Otherwise, if type is not empty, construct vocab by removing everything following the last SOLIDUS U+002F ("/") or NUMBER SIGN U+0023 ("#") from the path component of type.

Essentially, for schema.org, if item type is http://schema.org/Thing, vocab becomes http://schema.org/. This is then used as the current vocabulary when creating a property from name.

A lot of the wording from the algorithm could be improved with a rewrite that did not try to hold as close to the original algorithm removed from the first version of Microdata.

The registry was a concession to other vocabulary schemes, which aren't particularly important for schema.org, and the registry has gone mostly modified since the beginning:

{
  "http://schema.org/": {
    "properties": {
      "additionalType": {"subPropertyOf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}
    }
  },
  "http://microformats.org/profile/hcard": {}
}

There's a concession for hcard (of questionable value IMHO), and an inference rule for additionalType.

chaals commented 7 years ago

@gkellogg wrote:

Otherwise, if type is not empty, construct vocab by removing everything following the last SOLIDUS U+002F ("/") or NUMBER SIGN U+0023 ("#") from the path component of type.

Essentially, for schema.org, if item type is http://schema.org/Thing, vocab becomes http://schema.org/. This is then used as the current vocabulary when creating a property from name.

This matches what I think people do, and what I think people should do, but as far as I can tell that isn't documented in the microdata spec itself, and effectively contradicts the microdata spec, unless it said something meaningful about vocabularies - which as per #34 I don't think it does…

Which seems like an issue that should be fixed. And that means looking at what else people do with microdata, in case there is a real-world interoperability problem. If "everyone" uses it that way (for a sufficiently inclusive value of "everyone") then I think it is easiest to lift that as a general approach in the core algorithm. If people are actually relying on the current wording in the microdata spec, we'll need to do something like state that relevant types may be defined by vocabularies to take that approach.

Personally, I can't see the use case for typed items not to use the itemtype to act like RDFa's vocab. And it seems to me that the intention of itemtype taking URLs is to do this. But I am making some big assumptions there.

Hence wanting more people to weigh in.

/@hixie ?

chaals commented 7 years ago

As I quickly read (past tense) Microdata to RDF, itemtype="https://schema.org/Thing" means that the root vocabulary is "https://schema.org/" but itemtype="https://schema.org/Person/Teacher" actually results in a root vocabulary of "https://schema.org/Person/".

Reading more carefully, it appears that the intent is that the registry is a list of known vocabulary prefixes, optionally associating them with properties for subtypes and equivalent types.

Given the ambiguous language in the spec, and my hunch that processors don't actually fetch the registry in practice, I'm leaning toward the following, which essentially adopts the relevant bits from the Note:

@danbri?

chaals commented 7 years ago

@danbri wondered offline if there is an obvious default we can use to avoid needing a registry, but I don't think so.

gkellogg commented 7 years ago

Reading more carefully, it appears that the intent is that the registry is a list of known vocabulary prefixes, optionally associating them with properties for subtypes and equivalent types.

Indeed, but really, the only one that mattered was schema.org. At the time, there was thought of using URL hierarchies for extensions, but I believe this became disfavored, so it may be more theoretical.

Indeed, I don't think that processors dynamically fetch the registry, but built it in. There is a mechanism for updating it, and it has been done through the efforts of @iherman; typically, when the Note is revised, but it can be at anytime. Given that it is a Note, there is no real formal mechanism for doing this. Note that the registry also include vocabulary expansion, which allows us to go from http://schema.org/additionalType to rdf:type.

As you say, in the absence of an entry in the registry, the @itemtype value is taken up to the last NUMBER SIGN or SOLIDUS. Certainly, a processor may use it's own heuristics, at the potential for loosing interoperability.

chaals commented 7 years ago

The alternative would be to specify some particular rules in the spec.

Does anyone know of a use of microdata other than schema.org where you might have itemtype="https://some.host/compound/path" but want the prefix associated to be just "https://some.host/" ? I can imagine this being the case for W3C, if they publish a couple of substantial vocabularies... but that's not a current concrete use case.