w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
78 stars 19 forks source link

Clarify the coercion process from strings to objects in `Text Values or Objects` #314

Closed BigBlueHat closed 5 years ago

BigBlueHat commented 6 years ago

Unless I'm totally missing something, the approach described in Text Values or Objects (and used throughout the examples) is not mappable to the terms as described there.

The examples in that section are below:

{
    "author" : "Herman Melville"
}
{
    "author" : {
        "@type" : "Person",
        "name"  : "Herman Melville"
    }
}

Those two are supposed to be equivalent according to that section. However, that's not possible in a JSON-LD context...afaik.

It is possible to say that author is always a http://schema.org/name, but it's not possible to "upgrade" that to a http://schema.org/Person with a http://schema.org/name...or to say the opposite--but it's not possible to allow for either shape within a single document.

iherman commented 6 years ago

I don't understand the problem. The value of "author" can be an object or a string; what's the problem? See, eg, example on playground. From JSON-LD point of view this is perfectly all right.

B.t.w., "name" is a property, not a class, ie, I am not sure this is relevant.

akuckartz commented 6 years ago

Interpreted as RDF the two examples seem to be quite different.

iherman commented 6 years ago

@akuckartz and they of course are. The value of the property is either a string or a resource with a different class. But that is perfectly fine RDF and, therefore, JSON-LD.

The fact that we "equate" the two approaches, more exactly that we consider a simple string as a shorthand of a particular class instance is an application domain issue, and not an RDF or a JSON-LD syntax issue. Note that the example from the playground is also accepted by the structured data testing tool for schema.org (except that a "@type":"Book", or something similar, must be added to the top domain).

BigBlueHat commented 6 years ago

@iherman according to the https://schema.org/author docs (at least), the expected value is either a Person or an Organization--not a string.

Testing your earlier example with https://search.google.com/structured-data/testing-tool the result was that the string-only array entry got turned into an https://schema.org/Thing (which is the catch-all Schema.org class)--which seems pretty unique to that processor, though. Other testing kept it as a string...

What's thrown me off (maybe) is that the WebIDL and other "canonical manifest" processing "upgrades" the strings into specific classes:

In other words, a single string value is a shorthand for a Person object whose name property is set to that string value.

From https://w3c.github.io/wpub/#creators

Consequently, there's an assumption being made about the string which may not be expected--and isn't model-able (afaik) in JSON-LD--i.e. turning the string-version of author into a value for Person.name.

Given that publisher is also a "Creator", this will result in incorrect content/treatment--depending on the exceptions of the consumer:

{"publisher": "John Wiley & Sons, Inc."}

That would get me a Person...but I'm not sure if that's in the WebIDL or the JSON-LD or what...exactly.

Maybe you can clarify the intent? 😃 Because I'm clearly tangled up...

iherman commented 6 years ago

@BigBlueHat,

the terminology

In other words, a single string value is a shorthand for a Person object whose name property is set to that string value.

is probably misleading, which I believe is the source of confusion. @mattgarrish may help me out here to find the best formulation, but I guess it could/should say something like:

In other words, a single string value is a shorthand in a Web Publication Manifest for a Person object whose name property is set to that string value.

(Emphasis is mine, we should not necessarily keep it that way in the text.) Maybe we could/should add something like

It is RECOMMENDED to use Person and Organization objects instead of the shorthand, to avoid misunderstandings.

(Again, I'd trust @mattgarrish to find the best formulation.)

Ie,

...but I'm not sure if that's in the WebIDL or the JSON-LD or what...exactly.

it is actually none of the above, it is a modeling choice which has nothing to do with the JSON-LD per se, and the canonicalizaton makes these types of ambiguities disappear by the time it "gets" to an implementation of the WebIDL.

The reason I think we should do this is the usability argument: we should make simple things easy to express in the manifest. In many cases the only information the author wants to convey is simply the name of a person (e.g., "author":"Herman Melville") and forcing the manifest author to create, instead, "author":{"@type":"Person","name":"Herman Melville"} would be considered as a major pain in the back, mainly for those who are not used to such modeling approaches.


The Schema.org situation is interesting. First of all, as you say, the tools do accept data with simple strings. How they exactly manage things inside is unknown to us, but I suspect they do essentially the same: internally they would transform strings into objects, maybe simply "Thing-s" (as shown on their online tool). The information is certainly not considered to be an error in contrast to, e.g., providing a data without putting, say, a "@type":"CreativeWork" (or equivalent) on top.

Where it gets more interesting is that the do have this type of "shortcut" in their own examples. Look at, for example, at the JSON-LD representation of the third example of https://schema.org/Book, which includes the line:

"publisher": "Little, Brown, and Company",

instead of using

"publisher" : {
    "@type" : "Organization",
    "name" : "Little, Brown, and Company"
}

which, in our case, would indeed lead to the ambiguity that you refer to; they seem to be even more "lax" than we propose to be... But which also tell us that we SHOULD take this type of shorthand into account in any case.

There may be two possible actions (beyond the editorial ones above):

  1. Leave things as is, emphasizing (possibly even more) that our internal model would lead to a Person in this case
  2. We explicitly accept what schema.org does insofar as we generate a "Thing" in schema.org parlance, ie, for the Herman Melville example a simple string would be a shorthand for, simply, "author":{"name":"Herman Melville"}, ie, without a type. (Type is optional, not required).

I am not very confortable with (2) but I can live with it if this is the group's decision.

(B.t.w., @HadrienGardeur was arguing for (2) before...)

BigBlueHat commented 6 years ago

First, thanks for the explanation. It does make things clearer.

I am not very confortable with (2) but I can live with it if this is the group's decision.

The "magic" that Schema.org is doing (turning author and publisher into name's on Things) does seem sensible...but also a bit out of keeping with the design of JSON-LD (et al)--as in, it picks a specific location (name), but doesn't tell anyone that it's going to do that. It seems very specific to the modeling done in Schema.org where everything is a Thing and consequently everything may have a name.

I suppose we can start with clarifying things, and see where we end up...

BigBlueHat commented 6 years ago

@iherman this seems to be the issue where a similar process was considered (and declined) in JSON-LD https://github.com/w3c/json-ld-syntax/issues/31 Maybe it would be a job for framing?

There isn't really a way to know if "publisher": "qwepoiqwerj" should be an Organization or a Person, and I'm not sure assuming Person makes sense...at least not for publisher (where the type expectations are opposite of those in author)... Anyhow...still musing on ways to improve the vagueness of the void. 😺

iherman commented 6 years ago

@BigBlueHat I propose closing; these things are now part of the 'canonical manifest' algorithm in the draft. There may be specific issues on the algorithm, but they should be in different issues.

Cc @TzviyaSiegman @wareid @GarthConboy

BigBlueHat commented 6 years ago

@iherman will it be mandatory for the "canonical manifest algorithm" to be run before the manifest can be "understood" as JSON-LD? Ideally, this chunk of JSON-LD could be handed directly to a JSON-LD processor for consumption/parsing without additional pre-processing.

Relatedly, if there is mandatory pre-processing, then it's going to need its own media type.

iherman commented 6 years ago

What is now called (@mattgarrishâ„¢) the "authored" manifest is a perfectly valid JSON-LD and even valid schema.org metadata. The "canonical" manifest is the result of some extra, WP-specific mapping that adds/expands data (not in the JSON-LD expansion sense, though). Any canonical manifest is, syntax and content wise, a valid authored manifest, although not vice versa.

(If JSON-LD had some mapping/transformation language, the canonicalization could probably be defined in those terms...)

wareid commented 5 years ago

As discussed on Feb 4 2019, closing this issue as it's already addressed by the canonical manifest.