w3c / web-annotation

Web Annotation Working Group repository, see README for links to specs
https://w3c.github.io/web-annotation/
Other
142 stars 30 forks source link

Serialization of Lists #1

Closed azaroth42 closed 9 years ago

azaroth42 commented 10 years ago

Serialization of oa:List is more difficult than it needs to be as it is both the head node of the list and has other predicates associated with it. This means that typical serialization routines will either fail or generate inconsistent output, as they expect the list head node to be a blank node with no other properties. This situation could be avoided with a slightly different model.

Justification

Proposal

Have an rdf:List as the object of a new property of the oa:List.

{
 "@id": "http://example.org/annos/lists/1",
 "@type": "oa:List",
 "hasList": [ "target1", "target2" ]
}

Background Lists, in general, are needed to enable the following requirements:

tilgovi commented 10 years ago

Let me restate what you've written to clarify for myself.

Rather than being an rdf:List like class with additional oa:item relationships, the oa:List has an oa:hasList relationship to an rdf:List. This change is beneficial because that rdf:List has a simple encoding in JSON-LD.

Aside: the oa:item relationships are to ease searching by not requiring queries that may optionally traverse the rdf:List, correct?

azaroth42 commented 10 years ago

Yes, you have it exactly correct.

Regarding oa:item, in the proposed solution it would be removed from the model but could be added in locally to make querying easier. (Similarly some relationship between the annotation and the object of an oa:hasSource on a selector could be added like this, but not needed at the interop level). Otherwise the information would be included twice in the serialization:

{
 "@id": "http://example.org/annos/lists/1",
 "@type": "oa:List",
 "hasList": [ "target1", "target2" ],  // an rdf:List
 "item" : ["target1", "target2" ]  // two identical predicates
}
azaroth42 commented 10 years ago

Meta -- now that we've decided to use github issues, it's a real issue :)

jjett commented 10 years ago

With regards to the need for an ordered list of multiple bodies. Can you clarify what it is about the relationship between the bodies that requires that something of a sequential nature be captured? As I recall the main use case for this was coming out of the video annotation examples that the CG and its predecessors were looking at. I'm wondering if it wouldn't be semantically simpler if "bodies" could nest within other "bodies." What would be the draw back if the range of oa:hasBody was broadened to include oa:SpecificResources in addition to oa:Annotation?

azaroth42 commented 10 years ago

Choice, as a list, would be a good example of when multiple bodies could be expressed as an ordered list. Not sure what you mean about nesting, but the range of hasBody is any resource. Did you mean the domain?

jjett commented 10 years ago

Yes, I meant the domain.

Essentially what I mean is that a body that is a specific resource could have another resource as a "body". If that one is also a specific resource then it becomes a simple ordered chain of bodies (like nesting Russian dolls) without going through the indirection gymnastics of also calling it a list.

Where it gets harder is when the bodies are all alternatives to each other, which seems like a presentational rather than a content issue, and may not need to be addressed at the level of the conceptual model (or at the RDF data model layer either). It seems more of a implementation issue for serializers and consumers. The lists were always the most challenging part of the multiplicity portions of the model and I'm wondering where annotation tools are going to exploit these kinds of data structures (ordered lists) other than at rendering time.

With simple choices it's easy to use some OR logics and ignore the ones you don't want. For consumers that don't like to choose ('choose for me'/'what is the default?') it seems like we could simply use a community best practice at serialization time to simply put 'default' choices at the top of an unordered list. What I'm getting at is, just because a list is "unordered" doesn't mean that some kind of order has not actually been employed during its creation. Does that make sense?

On Wed, Oct 15, 2014 at 1:31 PM, Rob Sanderson notifications@github.com wrote:

Choice, as a list, would be a good example of when multiple bodies could be expressed as an ordered list. Not sure what you mean about nesting, but the range of hasBody is any resource. Did you mean the domain?

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/1#issuecomment-59252996.

azaroth42 commented 10 years ago

To me, a long chain of resources is more complex than the same set of resources in a single layer. The advantage of rdf:List is that it maps cleanly to a JSON list, presenting the order to the consumer in the way that it can easily interact with it, rather than constructing the list by following down a potentially forking tree.

There's no top to an unordered list. Either a list is ordered, or it isn't. Anything else is serialization.

On Wed, Oct 15, 2014 at 12:10 PM, Jacob notifications@github.com wrote:

Yes, I meant the domain.

Essentially what I mean is that a body that is a specific resource could have another resource as a "body". If that one is also a specific resource then it becomes a simple ordered chain of bodies (like nesting Russian dolls) without going through the indirection gymnastics of also calling it a list.

Where it gets harder is when the bodies are all alternatives to each other, which seems like a presentational rather than a content issue, and may not need to be addressed at the level of the conceptual model (or at the RDF data model layer either). It seems more of a implementation issue for serializers and consumers. The lists were always the most challenging part of the multiplicity portions of the model and I'm wondering where annotation tools are going to exploit these kinds of data structures (ordered lists) other than at rendering time.

With simple choices it's easy to use some OR logics and ignore the ones you don't want. For consumers that don't like to choose ('choose for me'/'what is the default?') it seems like we could simply use a community best practice at serialization time to simply put 'default' choices at the top of an unordered list. What I'm getting at is, just because a list is "unordered" doesn't mean that some kind of order has not actually been employed during its creation. Does that make sense?

On Wed, Oct 15, 2014 at 1:31 PM, Rob Sanderson notifications@github.com wrote:

Choice, as a list, would be a good example of when multiple bodies could be expressed as an ordered list. Not sure what you mean about nesting, but the range of hasBody is any resource. Did you mean the domain?

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/1#issuecomment-59252996.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/1#issuecomment-59259039.

Rob Sanderson Technology Collaboration Facilitator Digital Library Systems and Services Stanford, CA 94305

jjett commented 10 years ago

I think you may be conflating lists and sets. All lists have an order, even those that are presented as unordered lists. Only sets are truly unordered.

I'm not certain I agree about rdf:list being less complicated. It seems to me that it will take many more triples to express the same concept, e.g., I can build a graph model that directly expresses the relationships between the Russian dolls or I can punt this bit of effort over to a rdf:List structure. Since the oa context in JSON-LD is not yet set into stone, I'm not sure why we couldn't map the nesting structure to a "list" in JSON. Is there something about moving from RDF to JSON-LD contexts that would prevent this?

It's probably going to be helpful to closely examine the roles these structures are meant to play. It's easy to see when what we want is a choice from a set of options or when we want to annotate some kind of aggregate entity, e.g., the "bothness", "togetherness", "juxtapositionness" of a pair of juxtaposed images. It is much less clear what role(s) ordered sequences play within the model at every level.

On Wed, Oct 15, 2014 at 2:22 PM, Rob Sanderson notifications@github.com wrote:

To me, a long chain of resources is more complex than the same set of resources in a single layer. The advantage of rdf:List is that it maps cleanly to a JSON list, presenting the order to the consumer in the way that it can easily interact with it, rather than constructing the list by following down a potentially forking tree.

There's no top to an unordered list. Either a list is ordered, or it isn't. Anything else is serialization.

On Wed, Oct 15, 2014 at 12:10 PM, Jacob notifications@github.com wrote:

Yes, I meant the domain.

Essentially what I mean is that a body that is a specific resource could have another resource as a "body". If that one is also a specific resource then it becomes a simple ordered chain of bodies (like nesting Russian dolls) without going through the indirection gymnastics of also calling it a list.

Where it gets harder is when the bodies are all alternatives to each other, which seems like a presentational rather than a content issue, and may not need to be addressed at the level of the conceptual model (or at the RDF data model layer either). It seems more of a implementation issue for serializers and consumers. The lists were always the most challenging part of the multiplicity portions of the model and I'm wondering where annotation tools are going to exploit these kinds of data structures (ordered lists) other than at rendering time.

With simple choices it's easy to use some OR logics and ignore the ones you don't want. For consumers that don't like to choose ('choose for me'/'what is the default?') it seems like we could simply use a community best practice at serialization time to simply put 'default' choices at the top of an unordered list. What I'm getting at is, just because a list is "unordered" doesn't mean that some kind of order has not actually been employed during its creation. Does that make sense?

On Wed, Oct 15, 2014 at 1:31 PM, Rob Sanderson notifications@github.com

wrote:

Choice, as a list, would be a good example of when multiple bodies could be expressed as an ordered list. Not sure what you mean about nesting, but the range of hasBody is any resource. Did you mean the domain?

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/1#issuecomment-59252996.

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/1#issuecomment-59259039.

Rob Sanderson Technology Collaboration Facilitator Digital Library Systems and Services Stanford, CA 94305

— Reply to this email directly or view it on GitHub https://github.com/w3c/web-annotation/issues/1#issuecomment-59260900.

azaroth42 commented 10 years ago

Yes, contexts cannot change the shape of the graph. Lists get special processing in the same way that they do in Turtle to avoid exposing the blank nodes and rdf:first, rdf:rest.

A chain of three ordered bodies would be:

{
   "hasBody" : {
      "hasSource" : "eg:body1",
      "hasBody" : {
         "hasSource": "eg:body2",
         "hasBody" : {
            "@id": "eg:body3"
         }
      }
   }
}

A list would be:

{
   "hasBody" : {
      "@type": "oa:List",
      "members":  ["eg:body1", "eg:body2", "eg:body3"]
   }
}
stain commented 10 years ago

I am a bit reluctant to change the model to suit one particular serialization (as much as I am a fan of JSON-LD) - particularly if this means harder querying in SPARQL. SPARQL 1.1 does not have intuitive support for rdf:Lists, e.g. with your proposal you would have to do something like:

SELECT ?bodyInList WHERE {
  ?ann oa:hasBody ?body .
  ?body oa:hasList ?list . 
  ?list rdf:rest*/rdf:first ?bodyInList .
 }

.. meanwhile the current model allows the shortcut oa:item and need no property chains:

SELECT ?bodyInList WHERE {
  ?ann oa:hasBody ?body .
  ?body oa:item ?bodyInList .
 }

Different SPARQL engines do however have varying degrees of additional support for RDF lists:

Jena:

SELECT ?bodyInList WHERE {
  ?ann oa:hasBody ?body .
  ?body list:member ?bodyInList ;
            list:index (?index ?bodyInList) .
 }
 ORDER BY ?index 

We can also possibly add OWL property chains to simplify this.

Collection Ontology

As a side-note.. but not a direct suggestion, I would like to mention the Collection Ontology, as described in its paper (You might recognize the author :) ) recognizes the value of this kind shortcut and letting the inference layer do the job, adding some verbosity with a co:Item intermediary which represent the item within the collection, and so can have properties like co:index and co:nextItem. co:nextItem can be used to chain the list:

{ "@id": "http://example.org/annos/lists/1",
  "@type": "co:List",
  "co:firstItem" {
    "co:itemContent": "target1",
    "co:index": 1,
    "co:nextItem": {
      "co:itemContent": "target2",
      "co:index": 2
    }
  }
}

Alternatively, to avoid deep nesting :

{ "@id": "http://example.org/annos/lists/1",
  "@type": "co:List",
  "co:item" [
     {  "co:itemContent": "target1",
         "co:index": 1, }
     {  "co:itemContent": "target2",
         "co:index": 2 }
     ]
  }
}

Both allow queries like:

SELECT ?bodyInList WHERE {
  :ann oa:hasBody ?body .
  ?body co:element ?bodyInList .
 }

(through OWL reasoning)

but also:

SELECT ?bodyInList,?index WHERE {
  :ann oa:hasBody ?body .
  ?body co:item ?item .
  ?item co:itemContent ?bodyInList;
    co:index ?index .
 }
 ORDER BY ?index
iherman commented 10 years ago

Hi Stian,

I am not sure I agree with your comment. The current model would force any serialization to be more complicated, not only JSON-LD, simply because it requires the additional oa:item relationships explicitly. Either we have to reveal the plumbing of lists (like in the current spec, but which we do not want) or we have to repeat ourselves by adding explicit oa:item relationships to the bodies, ie, being overly verbose by repeating the object references. I think that, essentially, dropping the oa:item makes sense in this respect, even if, say, Turtle is used (which also has a syntax for lists).

As for the SPARQL examples below: what is wrong with using property path? It is part of the basic SPARQL1.1 query language; actually, one of the use cases for adding it was to be able to give a proper way of querying lists like the ones we have here. A few years ago, when SPARQL1.1 was not yet around, adding oa:item made a lot of sense to make the graph query-able, but this is no longer an issue imho. (AFAIK, Jena added a list management prior to the SPARQL1.1 era exactly for this problem.)

Cheers

Ivan

P.S. B.t.w., relying on OWL could lead to issues. Although the SPARQL1.1 entailment regime is indeed defined, it is not part of the 'core' of SPARQL1.1, ie, conformant SPARQL 1.1 implementations are not required to implement it. I have not kept up with the current status of SPARQL implementations these days, but I am afraid only a few of them implement the entailment regime even for the simplest OWL Profile (ie, OWL-RL). Which is sad, but that is the way it is:-(

On 20 Oct 2014, at 11:01 , Stian Soiland-Reyes notifications@github.com wrote:

I am a bit reluctant to change the model to suit one particular serialization (as much as I am a fan of JSON-LD) - particularly if this means harder querying in SPARQL. SPARQL 1.1 does not have intuitive support for rdf:Lists, e.g. with your proposal you would have to do something like:

SELECT ?bodyInList WHERE { ?ann oa:hasBody ?body . ?body oa:hasList ?list . ?list rdf:rest*/rdf:first ?bodyInList . }

.. meanwhile the current model allows the shortcut oa:item and need no property chains:

SELECT ?bodyInList WHERE { ?ann oa:hasBody ?body . ?body oa:item ?bodyInList . }

Different SPARQL engines do however have varying degrees of additional support for RDF lists:

Jena:

SELECT ?bodyInList WHERE { ?ann oa:hasBody ?body . ?body list:member ?bodyInList ; list:index (?index ?bodyInList) . } ORDER BY ?index

We can also possibly add OWL property chains to simplify this.

Collection Ontology

As a side-note.. but not a direct suggestion, I would like to mention the Collection Ontology, as described in its paper (You might recognize the author :) ) recognizes the value of this kind shortcut and letting the inference layer do the job, adding some verbosity with a co:Item intermediary which represent the item within the collection, and so can have properties like co:index and co:nextItem. co:nextItem can be used to chain the list:

{ "@id": "http://example.org/annos/lists/1",

"@type": "co:List",

"co:firstItem" {

"co:itemContent": "target1",

"co:index": 1,

"co:nextItem": {

"co:itemContent": "target2",

"co:index": 2

}

} } Alternatively, to avoid deep nesting :

{ "@id": "http://example.org/annos/lists/1",

"@type": "co:List",

"co:item" [

{ "co:itemContent": "target1",

"co:index": 1, }

{ "co:itemContent": "target2",

"co:index": 2 }

]

} } Both allow queries like:

SELECT ?bodyInList WHERE { :ann oa:hasBody ?body . ?body co:element ?bodyInList . }

(through OWL reasoning)

but also:

SELECT ?bodyInList,?index WHERE { :ann oa:hasBody ?body . ?body co:item ?item . ?item co:itemContent ?bodyInList; co:index ?index . } ORDER BY ?index

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D WebID: http://www.ivan-herman.net/foaf#me

stain commented 10 years ago

Good point, @iherman. Then I think I agree in general then for the proposed changes. :-)

azaroth42 commented 9 years ago

Resolved in FPWD (cleaning issues)