Allow literals directly as a body?

w3c / web-annotation

Web Annotation Working Group repository, see README for links to specs

https://w3c.github.io/web-annotation/

Other

141 stars 30 forks source link

Allow literals directly as a body? #13

Closed azaroth42 closed 8 years ago

azaroth42 commented 9 years ago

A request from CSV WG to allow simple literals as a body.

stain commented 9 years ago

I don't think this can be done without introducing a second relation, e.g. oa:hasBodyValue. Otherwise the oa:hasBody property (and the OA model) can't be represented in OWL, which I would have hoped was in scope. (Unless oa:hasBody is made an owl:AnnotationProperty - which would be confusing matters!)

The reason is that an owl:ObjectProperty can't also be a owl:DatatypeProperty - and thus a property can't at some cases point to an RDF Literal and in other cases to a RDF Resource.

I don't quite understand why this is too complicated:

{ "target": "http://example.com/doc1",
  "body": { "value":  "The value"} 
}

But if it really is too much for some clients, then perhaps instead:

{ "target": "http://example.com/doc1",
  "bodyValue": "The value" 
}

oa:bodyValue can be made equivalent to the property chain oa:hasBody->rdf:value - and thus both approaches would come out the same in the OWL world.

jjett commented 9 years ago

This looks more like an issue with OWL to me. It can't infer the usage on a case by case basis, which perhaps should have been a use case that needed to be addressed.

The work around is probably to map oa:hasBody to the owl:AnnotationProperty type as you suggest. This punts the effort for figuring out if there is a literal there or an RDF resource there to the consuming application, which sounds fine. The context is simple, either they received a string that is a URI (or a blank node/IRI) or they received a string that isn't.

There are other instances of this workaround, dc:creator comes to mind (along with a lot of other Dublin Core properties). Typing it as owl:AnnotationProperty shouldn't hurt anything. It's just weird (but seemingly harmless) semantics.

tilgovi commented 9 years ago

The context is simple, either they received a string that is a URI (or a blank node/IRI) or they received a string that isn't.

I was with you until this, @jjett. I don't think it's possible to reliably distinguish these things. Not all things starting with letters and a colon are URIs, but not all possible URI schemes can be known or planned for and sometimes I may actually want a literal string which happens to start with a URI.

stain commented 9 years ago

This kind of ambiguity and short-thinking behind dc:creator, and resulting inconsistent use, is in fact my main motivation for being strongly against this. It basically means I can't involve dc:creator for anything in any ontology except for describing the ontology itself, because I can't know if there is an actual "creator" or just a string literal there.

I have blamed this on DC Elements being made 15 years ago (before RDF and OWL was invented). I would not expect such an ambiguity to make it into a W3C specification in 2014.

owl:AnnotationProperty is not harmless semantics, it is the lack of semantics. I would think semantics relating to the body of an annotation could be quite important - e.g. "Was (parts of) this resource involved in an annotation".

On 12 November 2014 17:16, Jacob notifications@github.com wrote:

This looks more like an issue with OWL to me. It can't infer the usage on a case by case basis, which perhaps should have been a use case that needed to be addressed.

The work around is probably to map oa:hasBody to the owl:AnnotationProperty type as you suggest. This punts the effort for figuring out if there is a literal there or an RDF resource there to the consuming application, which sounds fine. The context is simple, either they received a string that is a URI (or a blank node/IRI) or they received a string that isn't.

There are other instances of this workaround, dc:creator comes to mind (along with a lot of other Dublin Core properties). Typing it as owl:AnnotationProperty shouldn't hurt anything. It's just weird (but seemingly harmless) semantics.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62755161.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

azaroth42 commented 9 years ago

The JSON-LD context is simple: it is always a string and never a URI if the value is a string. This is why there's the raft of restrictions on when it can be used to prevent having two separate ways of saying string + format or string + language, that look very similar but in fact are very different.

However people are very likely to not understand this and use it incorrectly meaning a URI, especially as this is possible in Open Annotation ... and then there's a significant interoperability nightmare.

I agree with Stian that the costs of having a string literal body outweigh the benefits, but that was the consensus of the f2f.

stain commented 9 years ago

No, I don't think we are suggesting to parse the string literal!

In JSON-LD:

Literal as direct literal value:

{ "body": { "@value": "Value" } }

can be written in short-hand as

{ "body": "Value"  }

Both are equivalent to the Turtle:

[ oa:hasBody "Value" ] .

e.g. a string literal "Value".

.. but this short-hand only works if "body" is defined in the @context as:

{ "body": { "@id": "http://www.w3.org/ns/oa#hasBody" }

(or with a specific RDF literal as @type, e.g. xsd:float.)

I think currently we have the context as:

{ "body": { "@id": "http://www.w3.org/ns/oa#hasBody",
            "@type": "@id" }  
}

This mean that if we have a body resource:

{ "body": { "@id": "http://example.com/body.txt" }
}

Then this can be shortened to:

{ "body": "http://example.com/body.txt" }

Both are equivalent to the Turtle:

[ oa:hasBody <http://example.com/body.txt> ] .

e.g. with an RDF Resource (that can have additional properties).

But if the body key in JSON-LD is to be used with both literals and resources - OWL and misdesign issues set aside - this means that we can't have that short-cut for body resources (although we probably still want to keep it for target).

On the other side:

{ "hasBody": { "value": "Value" } }

is straight forward to me, and allows easy upgrading to tags (as discussed today):

{ "hasBody": { "value": "Value",
                      "type": "Tag" }
}

and even to have a hyperlink for the tag - which most websites today have:

{ "hasBody": { "value": "Value",
                      "type": "Tag",
                      "@id": "http://example.com/tags/Value" }
}

Now why should you as a consuming code not be able to handle these three embedded bodies in the same way, and gracefully render it in a more clever way?

stain commented 9 years ago

If you think people are going to use a string literal to indicate a URI, when they mean the resource/page (and not say the user just typing in a textual comment that looks like a URI) - then that is an even stronger reason to use the value property and force the type as "@id".

It is allowed to re-vote on a decision.

I am afraid I was not able to follow the F2F from Manchester - but even if a decission was made I believe it was not an informed decision that is compatible with existing technologies and specifications.

In fact, just 5 minutes after the F2F meeting was over I saw a question on #json-ld IRC channel where it was asked if it was possible to mix-and-match literals and resources, e.g. put addtional properties in the a literal JSON-LD node using @value. My answer then - as now - was that rdf:value serves exactly this purpose and can be aliased as value in JSON-LD.

azaroth42 commented 9 years ago

Yes. Which is why we have examples like 3.1 and 3.2.9 where there's the otherwise unnecessary {"@id":...} wrapped around the URIs, but not for target (eg 3.2.1 and 3.2.4).

I still believe that allowing literal bodies is more complicated than not allowing them. We could even make it a non issue by requiring a type to be associated with every body and target, making the resource construction mandatory.

But others disagree.

jjett commented 9 years ago

Agreed. Not possible to distinguish with complete reliability. I think it's pretty likely that most bodies and targets are either going to be URLs or IRIs though. It seems like there are some tests that could be engineered to determine many of the cases one way or the other (resource or literal). Whether or not anyone should do that engineering is a separate matter...

On Wed, Nov 12, 2014 at 3:29 PM, Randall Leeds notifications@github.com wrote:

The context is simple, either they received a string that is a URI (or a blank node/IRI) or they received a string that isn't.

I was with you until this, @jjett https://github.com/jjett. I don't think it's possible to reliably distinguish these things. Not all things starting with letters and a colon are URIs, but not all possible URI schemes can be known or planned for and sometimes I may actually want a literal string which happens to start with a URI.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62797553.

azaroth42 commented 9 years ago

Others think that most will be string literals, and hence the request to allow it in the simplest JSON form. But whether a string is a URI or not is not a decision we need to make. In the current ED model it is always just a string, never a URI.

jjett commented 9 years ago

I understand and agree. This issue with the semantics of owl:AnnotationProperty came up for me in the context of another project here where we're building a collection's data model. For better or worse we have to accommodate alot of DC predicates (many of which are owl:AnnotationProperties).

The semantics look all wrong to me. Nevertheless, I have been unable to generate any actual data that demonstrates any negative impacts from using predicates that are owl:AnnotationProperties. Intellectually it looks as weird as hell and shouldn't, strictly speaking, work but, it seems to have no effect... o.O

If you have some proof that it demonstrates otherwise it would be a great thing for me to cite (I am looking for a reason to avoid using anything typed as an owl:AnnotationProperty). We might continue discussing this specific topic off-list if you're interested.

On Wed, Nov 12, 2014 at 3:45 PM, Stian Soiland-Reyes < notifications@github.com> wrote:

This kind of ambiguity and short-thinking behind dc:creator, and resulting inconsistent use, is in fact my main motivation for being strongly against this. It basically means I can't involve dc:creator for anything in any ontology except for describing the ontology itself, because I can't know if there is an actual "creator" or just a string literal there.

I have blamed this on DC Elements being made 15 years ago (before RDF and OWL was invented). I would not expect such an ambiguity to make it into a W3C specification in 2014.

owl:AnnotationProperty is not harmless semantics, it is the lack of semantics. I would think semantics relating to the body of an annotation could be quite important - e.g. "Was (parts of) this resource involved in an annotation".

On 12 November 2014 17:16, Jacob notifications@github.com wrote:

This looks more like an issue with OWL to me. It can't infer the usage on a case by case basis, which perhaps should have been a use case that needed to be addressed.

The work around is probably to map oa:hasBody to the owl:AnnotationProperty type as you suggest. This punts the effort for figuring out if there is a literal there or an RDF resource there to the consuming application, which sounds fine. The context is simple, either they received a string that is a URI (or a blank node/IRI) or they received a string that isn't.

There are other instances of this workaround, dc:creator comes to mind (along with a lot of other Dublin Core properties). Typing it as owl:AnnotationProperty shouldn't hurt anything. It's just weird (but seemingly harmless) semantics.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62755161.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62800537.

stain commented 9 years ago

You will easily end up declaring new Dublin Core properties...

so say are not careful, and you define a new ObjectProperty :myFormat that specialize dc:format. It seems as if it is working, but basically you have now also declared a new owl:ObjectProperty dc:format. Protege should show this - but most other tools will keep quiet about the mishap.

But what if another ontology does the same, but defines a DataTypeProperty :specialFormat that specializes dc:format, and you happen to (indirectly) import this? Now you get an inconsistent ontology.

On 12 November 2014 22:41, Jacob notifications@github.com wrote:

I understand and agree. This issue with the semantics of owl:AnnotationProperty came up for me in the context of another project here where we're building a collection's data model. For better or worse we have to accommodate alot of DC predicates (many of which are owl:AnnotationProperties).

The semantics look all wrong to me. Nevertheless, I have been unable to generate any actual data that demonstrates any negative impacts from using predicates that are owl:AnnotationProperties. Intellectually it looks as weird as hell and shouldn't, strictly speaking, work but, it seems to have no effect... o.O

If you have some proof that it demonstrates otherwise it would be a great thing for me to cite (I am looking for a reason to avoid using anything typed as an owl:AnnotationProperty). We might continue discussing this specific topic off-list if you're interested.

On Wed, Nov 12, 2014 at 3:45 PM, Stian Soiland-Reyes < notifications@github.com> wrote:

This kind of ambiguity and short-thinking behind dc:creator, and resulting inconsistent use, is in fact my main motivation for being strongly against this. It basically means I can't involve dc:creator for anything in any ontology except for describing the ontology itself, because I can't know if there is an actual "creator" or just a string literal there.

I have blamed this on DC Elements being made 15 years ago (before RDF and OWL was invented). I would not expect such an ambiguity to make it into a W3C specification in 2014.

owl:AnnotationProperty is not harmless semantics, it is the lack of semantics. I would think semantics relating to the body of an annotation could be quite important - e.g. "Was (parts of) this resource involved in an annotation".

On 12 November 2014 17:16, Jacob notifications@github.com wrote:

This looks more like an issue with OWL to me. It can't infer the usage on a case by case basis, which perhaps should have been a use case that needed to be addressed.

The work around is probably to map oa:hasBody to the owl:AnnotationProperty type as you suggest. This punts the effort for figuring out if there is a literal there or an RDF resource there to the consuming application, which sounds fine. The context is simple, either they received a string that is a URI (or a blank node/IRI) or they received a string that isn't.

There are other instances of this workaround, dc:creator comes to mind (along with a lot of other Dublin Core properties). Typing it as owl:AnnotationProperty shouldn't hurt anything. It's just weird (but seemingly harmless) semantics.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62755161.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62800537.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62809014.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

jjett commented 9 years ago

Agreed, although I'm not convinced that we can't use owl:AnnotationProperty as an alternative to making an additional oa:hasLiteralBody predicate. It simplifies the complexity problem (which also goes away if we simply don't engineer for owl-dl or owl-lite).

I'm not confidant that we need to type the model's predicates with regards to resource vs. literal at all. There are plenty of other ambiguities in the model, this basic RDF ambiguity (that a subject or an object may be a literal or a resource) doesn't affect most use cases.

I also think we'll eventually have to allow literal bodies. The minority that want them is too large of a stakeholder group not to accommodate. It could affect uptake if we don't. So the question for me is what is the most efficient way to go about it?

An alternate tack to take would be to punt on this issue and allow communities interested in such things to extend the model themselves (through a sub-type of the oa:hasBody predicate, probably ex:hasLiteralBody, as Stian suggested, except that we let outsiders do the dirty work).

That option doesn't seem to appetizing to me though.

For the record I don't actually like literal bodies but I have made peace with the fact that there are enough people that want them that we need to find a way to accommodate them in the model.

On Wed, Nov 12, 2014 at 4:30 PM, Rob Sanderson notifications@github.com wrote:

Others think that most will be string literals, and hence the request to allow it in the simplest JSON form. But whether a string is a URI or not is not a decision we need to make. In the current ED model it is always just a string, never a URI.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62807302.

jjett commented 9 years ago

The problem is most other tools...this has implications for the library linked data initiative too I think because MODSRDF and similar library metadata standards employ owl:AnnotationProperty in the definitions of many predicates; the problem isn't just limited Dublin Core. Since most tools ignore this problem, most people don't really see it as a problem (and don't actually encounter it as a problem).

On Wed, Nov 12, 2014 at 4:59 PM, Stian Soiland-Reyes < notifications@github.com> wrote:

You will easily end up declaring new Dublin Core properties...

so say are not careful, and you define a new ObjectProperty :myFormat that specialize dc:format. It seems as if it is working, but basically you have now also declared a new owl:ObjectProperty dc:format. Protege should show this - but most other tools will keep quiet about the mishap.

But what if another ontology does the same, but defines a DataTypeProperty :specialFormat that specializes dc:format, and you happen to (indirectly) import this? Now you get an inconsistent ontology.

On 12 November 2014 22:41, Jacob notifications@github.com wrote:

I understand and agree. This issue with the semantics of owl:AnnotationProperty came up for me in the context of another project here where we're building a collection's data model. For better or worse we have to accommodate alot of DC predicates (many of which are owl:AnnotationProperties).

The semantics look all wrong to me. Nevertheless, I have been unable to generate any actual data that demonstrates any negative impacts from using predicates that are owl:AnnotationProperties. Intellectually it looks as weird as hell and shouldn't, strictly speaking, work but, it seems to have no effect... o.O

If you have some proof that it demonstrates otherwise it would be a great thing for me to cite (I am looking for a reason to avoid using anything typed as an owl:AnnotationProperty). We might continue discussing this specific topic off-list if you're interested.

On Wed, Nov 12, 2014 at 3:45 PM, Stian Soiland-Reyes < notifications@github.com> wrote:

This kind of ambiguity and short-thinking behind dc:creator, and resulting inconsistent use, is in fact my main motivation for being strongly against this. It basically means I can't involve dc:creator for anything in any ontology except for describing the ontology itself, because I can't know if there is an actual "creator" or just a string literal there.

I have blamed this on DC Elements being made 15 years ago (before RDF and OWL was invented). I would not expect such an ambiguity to make it into a W3C specification in 2014.

owl:AnnotationProperty is not harmless semantics, it is the lack of semantics. I would think semantics relating to the body of an annotation could be quite important - e.g. "Was (parts of) this resource involved in an annotation".

On 12 November 2014 17:16, Jacob notifications@github.com wrote:

This looks more like an issue with OWL to me. It can't infer the usage on a case by case basis, which perhaps should have been a use case that needed to be addressed.

The work around is probably to map oa:hasBody to the owl:AnnotationProperty type as you suggest. This punts the effort for figuring out if there is a literal there or an RDF resource there to the consuming application, which sounds fine. The context is simple, either they received a string that is a URI (or a blank node/IRI) or they received a string that isn't.

There are other instances of this workaround, dc:creator comes to mind (along with a lot of other Dublin Core properties). Typing it as owl:AnnotationProperty shouldn't hurt anything. It's just weird (but seemingly harmless) semantics.

— Reply to this email directly or view it on GitHub < https://github.com/w3c/web-annotation/issues/13#issuecomment-62755161>.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62800537.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62809014.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62811373.

stain commented 9 years ago

We can do it the other way around, of course. The model can have oa:hasLiteralBody (which doesn't need to be an AnnotationProperty, as it will always be a literal), but which is just "body" in the JSON-LD (or even "text" or something?).

We can then call oa:hasBody for "bodyResource" or something.

I am mainly concerned about building the wrong model :) We can sacrifice many things for uptake - but I would argue that keeping it consistent is also important.

On 12 November 2014 23:01, Jacob notifications@github.com wrote:

Agreed, although I'm not convinced that we can't use owl:AnnotationProperty as an alternative to making an additional oa:hasLiteralBody predicate. It simplifies the complexity problem (which also goes away if we simply don't engineer for owl-dl or owl-lite).

I'm not confidant that we need to type the model's predicates with regards to resource vs. literal at all. There are plenty of other ambiguities in the model, this basic RDF ambiguity (that a subject or an object may be a literal or a resource) doesn't affect most use cases.

I also think we'll eventually have to allow literal bodies. The minority that want them is too large of a stakeholder group not to accommodate. It could affect uptake if we don't. So the question for me is what is the most efficient way to go about it?

An alternate tack to take would be to punt on this issue and allow communities interested in such things to extend the model themselves (through a sub-type of the oa:hasBody predicate, probably ex:hasLiteralBody, as Stian suggested, except that we let outsiders do the dirty work).

That option doesn't seem to appetizing to me though.

For the record I don't actually like literal bodies but I have made peace with the fact that there are enough people that want them that we need to find a way to accommodate them in the model.

On Wed, Nov 12, 2014 at 4:30 PM, Rob Sanderson notifications@github.com wrote:

Others think that most will be string literals, and hence the request to allow it in the simplest JSON form. But whether a string is a URI or not is not a decision we need to make. In the current ED model it is always just a string, never a URI.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62807302.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62811578.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

jjett commented 9 years ago

+1 for consistency. :)

I'm hoping the majority will accept solution 1 in your proposal. Thanks for the additional insights into owl:AnnotationProperty.

On Wed, Nov 12, 2014 at 5:09 PM, Stian Soiland-Reyes < notifications@github.com> wrote:

We can do it the other way around, of course. The model can have oa:hasLiteralBody (which doesn't need to be an AnnotationProperty, as it will always be a literal), but which is just "body" in the JSON-LD (or even "text" or something?).

We can then call oa:hasBody for "bodyResource" or something.

I am mainly concerned about building the wrong model :) We can sacrifice many things for uptake - but I would argue that keeping it consistent is also important.

On 12 November 2014 23:01, Jacob notifications@github.com wrote:

Agreed, although I'm not convinced that we can't use owl:AnnotationProperty as an alternative to making an additional oa:hasLiteralBody predicate. It simplifies the complexity problem (which also goes away if we simply don't engineer for owl-dl or owl-lite).

I'm not confidant that we need to type the model's predicates with regards to resource vs. literal at all. There are plenty of other ambiguities in the model, this basic RDF ambiguity (that a subject or an object may be a literal or a resource) doesn't affect most use cases.

I also think we'll eventually have to allow literal bodies. The minority that want them is too large of a stakeholder group not to accommodate. It could affect uptake if we don't. So the question for me is what is the most efficient way to go about it?

An alternate tack to take would be to punt on this issue and allow communities interested in such things to extend the model themselves (through a sub-type of the oa:hasBody predicate, probably ex:hasLiteralBody, as Stian suggested, except that we let outsiders do the dirty work).

That option doesn't seem to appetizing to me though.

For the record I don't actually like literal bodies but I have made peace with the fact that there are enough people that want them that we need to find a way to accommodate them in the model.

On Wed, Nov 12, 2014 at 4:30 PM, Rob Sanderson notifications@github.com

wrote:

Others think that most will be string literals, and hence the request to allow it in the simplest JSON form. But whether a string is a URI or not is not a decision we need to make. In the current ED model it is always just a string, never a URI.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62807302.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62811578.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62812684.

tilgovi commented 9 years ago

I'm starting to wonder whether @stain is right and there has just been a lot of misunderstanding and fud, unintentionally.

I, for one, am very convinced by a context which allows {"body": {"value": "literal"}} as sufficiently simple to be the recommendation.

tilgovi commented 9 years ago

Do we have a simple way to get this into the vocabulary? Is there any utility to having an oa:Resource class or some such thing which has the value property? Then we can have that as the object of the hasBody and hasTarget predicates. In JSON-LD, we could have easy literal bodies and targets with {value} that way, but as strings they would be interpreted as URIs, if I'm not wrong. Of course, I'm an RDF newbie, so I may be totally confused.

iherman commented 9 years ago

Few comments; although I am not sure, at this moment (note that it is very early for me, combined with jet-lag:-)

I am also in the CSV on the Web WG where this issue comes from, and I am strongly in favor of finding some solution to allow the annotation body being a string. The metadata for CSV files (that may have annotations) is authored by not only non-RDF specialists, but non Computer Science people (the CSV metadata is defined to be in JSON and we make it JSON-LD compatible, too). I.e., it must be as simple as possible. The overwhelming majority of annotations will really be just a textual note added to the metadata ("This was produced by tool XYZ", etc), and pushing the {"body" : { "value": "This was produced by tool XYZ"}} would look unnatural, unclear to most of them. It is also in contradiction with what people see when using JSON where the blurring strings and URI-s is a fairly common practice. I think we MUST find a way to allow this; if we don't, we can be sure that there will be LOTS of data out there which will simply include invalid annotation data, because authors will just put literals there.
I am a bit cautious about the "not valid OWL" argument. Strictly speaking, this argument is incorrect (I'm sorry @stian). The right argument is "not valid OWL-DL". But let us not equate OWL with OWL-DL. My apologies for the usage of Semantic Web jargons, but the RDF Compatible Semantics of OWL2 has no problem with the same property being an objectProperty and a datatypeProperty, only the Direct Semantics, i.e., DL will has a problem. The question is: is this really a significant issue for this vocabulary? Knowing that, in fact, most of the Linked Datasets and their vocabularies out there are defined without really caring too much about OWL DL (well, about OWL in general, unfortunately), that very little OWL DL reasoning is done on those data sets, I would not go out of my way to ensure that the OA is really really OWL DL compatible. (Note that the OWL Profile that is the closest to Linked Data is OWL2 RL, whose rule-based processing is oblivious to the datatypeProperty and objectProperty duality.)
The only issue that matters to me is very practical: is it easy to differentiate whether a value is meant to be a URI or is just a string. And, knowing the complexity of URI syntax, including relative URI-s, that can be a real issue indeed. Having two different terms in JSON-LD is clearly a solution, but I am afraid that this cannot be done without also modifying the model to clearly allow bodies as literals; the JSON-LD @context mechanism cannot be used, afaik, to "transform" things so that the mapping would produce the {"body" : { "value": "This was produced by tool XYZ"}} type structures. But I may be wrong on that...

Cheers

Ivan

stain commented 9 years ago

I stand corrected on "not being valid OWL" - I was indeed thinking of OWL-DL, and OWL2 RL is much more inline with the requirements for graphs using our annotations. So my "strongly disagree" should be moderated to "disagree".

It is still making it a bit too easy to do the wrong thing, e.g.:

{ "target": "http://example.com/doc",
  "body": "http://example.com/comment-about-doc.txt" }

I somehow doubt that the intention was for the text h-t-t-p-... to be a literal annotation on the target - and would not dream about suggesting to "guess if it is an URI".

Given how the spec is presenting an annotation as binding together a target and body as two sides of the same story, both with the same options for specific resources etc, one one would be very tempted to describe them the same way in the syntax. In Turtle this is easy, but in the suggest JSON-LD it is not.

So to avoid that, we then need to remove "@type": "@id" for "target", and always use the long-form:

{"target": {"@id": "http://example.com/doc" }  }

As mentioned, the resources should be typed, so the long-form is likely to be used anyway.. and this would avoid having the duality to deal with here when parsing as pure JSON.

This needs to be fixed in example

http://w3c.github.io/web-annotation/model_fpwd/#simple-textual-bodies

in particular.

On 13 November 2014 23:01, Ivan Herman notifications@github.com wrote:

Few comments; although I am not sure, at this moment (note that it is very early for me, combined with jet-lag:-)

1.

I am also in the CSV on the Web WG where this issue comes from, and I am strongly in favor of finding some solution to allow the annotation body being a string. The metadata for CSV files (that may have annotations) is authored by not only non-RDF specialists, but non Computer Science people (the CSV metadata is defined to be in JSON and we make it JSON-LD compatible, too). I.e., it must be as simple as possible. The overwhelming majority of annotations will really be just a textual note added to the metadata ("This was produced by tool XYZ", etc), and pushing the {"body" : { "value": "This was produced by tool XYZ"}} would look unnatural, unclear to most of them. It is also in contradiction with what people see when using JSON where the blurring strings and URI-s is a fairly common practice. I think we MUST find a way to allow this; if we don't, we can be sure that there will be LOTS of data out there which will simply include invalid annotation data, because aut hors will just put literals there. 2.

I am a bit cautious about the "not valid OWL" argument. Strictly speaking, this argument is incorrect (I'm sorry @stian https://github.com/stian). The right argument is "not valid OWL-DL". But let us not equate OWL with OWL-DL. My apologies for the usage of Semantic Web jargons, but the RDF Compatible Semantics of OWL2 has no problem with the same property being an objectProperty and a datatypeProperty, only the Direct Semantics, i.e., DL will has a problem. The question is: is this really a significant issue for this vocabulary? Knowing that, in fact, most of the Linked Datasets and their vocabularies out there are defined without really caring too much about OWL DL (well, about OWL in general, unfortunately), that very little OWL DL reasoning is done on those data sets, I would not go out of my way to ensure that the OA is really really OWL DL compatible. (Note that the OWL Profile that is the closest to L inked Data is OWL2 RL, whose rule-based processing is oblivious to the datatypeProperty and objectProperty duality.) 3.

The only issue that matters to me is very practical: is it easy to differentiate whether a value is meant to be a URI or is just a string. And, knowing the complexity of URI syntax, including relative URI-s, that can be a real issue indeed. Having two different terms in JSON-LD is clearly a solution, but I am afraid that this cannot be done without also modifying the model to clearly allow bodies as literals; the JSON-LD @context mechanism cannot be used, afaik, to "transform" things so that the mapping would produce the {"body" : { "value": "This was produced by tool XYZ"}} type structures. But I may be wrong on that...

Cheers

Ivan

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-62982644.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

tilgovi commented 9 years ago

Feels like maybe we're down to a question of what's our recommended context and what language we need in the spec about these issues.

If we have established that it's not a problem for the model to allow literals, then it's just a matter of communication and providing convenient examples.

In JSON-LD it's always possible to specify the type of the node, as has been pointed out.

Setting good examples for others could just be Ivan's suggestion to have two mappings for different body types in the JSON-LD context and I think that would suit me fine. That looks quite like tags, doesn't it?

Other serializations shouldn't have the same issues, if I understand correctly.

azaroth42 commented 9 years ago

Weighing in ...

I agree that the confusion between "http://.../" in the target and in the body meaning different things is bad, but that is the price of having a punning property. I suggest that we ask stakeholders about it and see if they have an opinion... perhaps they'll agree it's a problem they created in solving another perceived problem.
I disagree that we need to make it harder for URI targets, just because we made it harder for the less common URI bodies. That would defeat the point of making it easier for the currently more common literal bodies.
Other serializations won't have the issue as (eg in Turtle) "http://example.org/" the string is different to http://example.org/ the URI ; however as we're promoting JSON-LD as the main serialization, I think we need to come to consensus on the potential impact, and try to mitigate as much as possible.
We can't use @context to transform the shape of the graph, so we can't have bodyLiteral generate the oa:hasBody - rdf:value construction. However I think we should communicate this as a desirable feature to the RDF Shapes WG.
We could have a level0 context and a level1+ context. In level 0 you're allowed to have literal bodies, and in 1+ you're not. This would be against the consensus that level N+1 is a superset of the level N requirements. I'm not convinced this is a good idea, but I'm putting it out on the table.

tilgovi commented 9 years ago

@azaroth42 excellent summary.

tilgovi commented 9 years ago

Here's a JSON-LD playground link showing both bodies as different mappings in the context: http://tinyurl.com/m2lrpj6

JSON-LD expansion algorithm can add the {@value} or {@id} wrapper.

tilgovi commented 9 years ago

I wouldn't even have a problem with some implementation storing literal targets. Really, there's very little we can do to prevent it. As an implementor of some reading system, why shouldn't I just store the annotations as keyed off the document they're in and interpret a literal target as a quote selector on the source implied by the path I took to find this annotation to begin with. Obviously, if I were to then write an exporter to share these annotations with other OA consumers, I would need to pick a URI for the book or file or whatever and construct target resources, but I point it out to say we really can't just wag our fingers at people "doing it wrong" especially when for their domain it may feel so correct and sensible.

What we can do is offer a model that allows for both and a recommended context that disambiguates as a compensation for JSON not having typed values. We simply say: that's so great you want to share your annotations so here's how you do it safely.

iherman commented 9 years ago

On 14 Nov 2014, at 08:49 , Rob Sanderson notifications@github.com wrote:

• We can't use @context to transform the shape of the graph, so we can't have bodyLiteral generate the oa:hasBody - rdf:value construction. However I think we should communicate this as a desirable feature to the RDF Shapes WG.

I am not sure the RDF Shapes group will help you. They do not provide transformation tools, so to say. But I know the JSON-LD community has worked on some sort of a transformation tool/algorithm that did not make it in the standard. We may want to reach out to that community (Manu Sporny, Gregg Kellogg), they can tell us more about it.

Ivan

Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

zoggy commented 9 years ago

I'm not a specialist of JSON-LD, but according the link provided by @tilgovi, it looks like our JSON-LD context could provide body, bodyText, and bodyURI properties.

bodyText would be used for one literal string content, bodyURI for just one URI content, and body for all other cases. This may improve readability of JSON-LD code, without ambiguity in the expanded JSON-LD.

iherman commented 9 years ago

On 14 Nov 2014, at 15:37 , Zoggy notifications@github.com wrote:

I'm not a specialist of JSON-LD, but according the link provided by @tilgovi, it looks like our JSON-LD context could provided body, bodyText, and bodyURI properties.

Indeed. But is it possible, through that JSON-LD context, to map the JSON-LD part into the same RDF model? Ie, something like

{ "body" : "foo" } -> {"body" : { "value" : "foo" } }

I am not sure this is possible. I would be happy to be proven wrong!

But, if I am right, then either the model must be modified to separate the two cases, or we have to live with the punning of object and datatype properties in practice (which is what Dublin Core does). I can actually live with the latter, but I recognize that it may add to the load of implementers who have to deal with this duality.

Ivan

bodyText would be used for one literal string content, bodyURI for just one URI content, and body for all other cases. This may improve readability on JSON-LD code, without ambiguity in the expanded JSON-LD.

— Reply to this email directly or view it on GitHub.

Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

zoggy commented 9 years ago

Indeed, this does not seem possible. I had some hope playing with "@container" but it cannot be used for this purpose :-(

stain commented 9 years ago

No, you would need to add OWL inferences and property chains (or other kind of inference rules) to do that transition after parsing it to a graph.

In JSON-LD a {} corresponds directly to an RDF resource, so the properties you give are directly related to that resource (although they do have @inverse to do relations the other way around, which is fairly nice, but not yet supported in JSON-LD framing)

On 14 November 2014 10:51, Ivan Herman notifications@github.com wrote:

On 14 Nov 2014, at 15:37 , Zoggy notifications@github.com wrote:

I'm not a specialist of JSON-LD, but according the link provided by @tilgovi, it looks like our JSON-LD context could provided body, bodyText, and bodyURI properties.

Indeed. But is it possible, through that JSON-LD context, to map the JSON-LD part into the same RDF model? Ie, something like

{ "body" : "foo" } -> {"body" : { "value" : "foo" } }

I am not sure this is possible. I would be happy to be proven wrong!

But, if I am right, then either the model must be modified to separate the two cases, or we have to live with the punning of object and datatype properties in practice (which is what Dublin Core does). I can actually live with the latter, but I recognize that it may add to the load of implementers who have to deal with this duality.

Ivan

bodyText would be used for one literal string content, bodyURI for just one URI content, and body for all other cases. This may improve readability on JSON-LD code, without ambiguity in the expanded JSON-LD.

— Reply to this email directly or view it on GitHub.

Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-63042040.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

iherman commented 9 years ago

Hi Stian,

I was afraid to be right:-)

I think then my favourite approach would be to have two different predicates for the object and the literal cases. That would make the job of implementers easier without bothering users.

One more thing that came up during discussion, namely whether these properties should be annotation properties in the OWL sense. I do not see any problem doing that. You did say that this means 'it does not really have semantics'; what it means, for those who are not versed in OWL, that objects related to these properties are not part of the usual OWL inferencing procedures. I actually believe that is a good thing! I do not really see a use case for running OWL inferencing on the annotation structures themselves. For me, the typical scenario in the RDF+OWL world is when one has complex datasets in some area (eg, biomedical), meaning that the reasoning should be on that core data set. From that point of view annotation is just noise. That is exactly what the annotation properties in OWL are for, if my understanding is correct: to separate the noise from the core data. So, actually, declaring bodyLiteral and bodyObject (just to come up with some bad names...) as being annotation properties in the OWL sense is exactly the right thing to do!

Cheers

Ivan

On 14 Nov 2014, at 21:44 , Stian Soiland-Reyes notifications@github.com wrote:

No, you would need to add OWL inferences and property chains (or other kind of inference rules) to do that transition after parsing it to a graph.

In JSON-LD a {} corresponds directly to an RDF resource, so the properties you give are directly related to that resource (although they do have @inverse to do relations the other way around, which is fairly nice, but not yet supported in JSON-LD framing)

On 14 November 2014 10:51, Ivan Herman notifications@github.com wrote:

On 14 Nov 2014, at 15:37 , Zoggy notifications@github.com wrote:

I'm not a specialist of JSON-LD, but according the link provided by @tilgovi, it looks like our JSON-LD context could provided body, bodyText, and bodyURI properties.

Indeed. But is it possible, through that JSON-LD context, to map the JSON-LD part into the same RDF model? Ie, something like

{ "body" : "foo" } -> {"body" : { "value" : "foo" } }

I am not sure this is possible. I would be happy to be proven wrong!

But, if I am right, then either the model must be modified to separate the two cases, or we have to live with the punning of object and datatype properties in practice (which is what Dublin Core does). I can actually live with the latter, but I recognize that it may add to the load of implementers who have to deal with this duality.

Ivan

bodyText would be used for one literal string content, bodyURI for just one URI content, and body for all other cases. This may improve readability on JSON-LD code, without ambiguity in the expanded JSON-LD.

— Reply to this email directly or view it on GitHub.

Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/13#issuecomment-63042040.

Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718 — Reply to this email directly or view it on GitHub.

Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

azaroth42 commented 9 years ago

Resolution from 2015-02-18 telcon was to keep the current model from FPWD and to update the principles to state that inferencing is not a significant design consideration, and thus the punning property is easier than two separate properties. Tagging as defer until such time as an interested party can make the case for two properties without using inference as a rationale.

azaroth42 commented 8 years ago