w3c / web-annotation

Web Annotation Working Group repository, see README for links to specs
https://w3c.github.io/web-annotation/
Other
142 stars 30 forks source link

Annotation Lists #50

Closed azaroth42 closed 8 years ago

azaroth42 commented 9 years ago

Several downstream systems have a need for lists of annotations, including EPUB [1] and IIIF [2]. For search, we need to have a list of annotations for the result set of applying the query to the set of annotations. Other expressed use cases are user constructed "playlists" of Annotations, curated distribution lists of annotations, and general optimization of annotation retrieval to avoid thousands of HTTP calls for each annotation individually.

As an initial proposal, we could use Activity-Streams's OrderedCollection class[3], which seems to fulfill the (implicit, to be expressed) requirements:

{
  "@context": ["http://www.w3.org/ns/activitystreams", "http://www.w3.org/ns/oa"],
  "@type": "OrderedCollection",
  "totalItems": 10,
  "itemsPerPage": 1,
  "next": "http://example.org/foo?page=2",
  "self": "http://example.org/foo?page=1",
  "startIndex": 0,
  "orderedItems": [
    {
      "@type": "Annotation",
      "motivation": "commenting",
      "body": {"value": "I like this!"},
      "target": "http://www.cnn.com/"
    }
  ]
}

This would be consistent with a (to-be-proposed) use of AS2.0 for notifications about annotation activity.

[1] http://www.idpf.org/epub/oa/#h.48f1o3s9o9hf [2] http://iiif.io/api/presentation/2.0/#other-content-resources [3] http://www.w3.org/TR/activitystreams-core/#collections

tilgovi commented 9 years ago

:heart_eyes:

edsu commented 9 years ago

What @tilgovi said :-)

iherman commented 9 years ago

On 07 Jul 2015, at 18:03 , Rob Sanderson notifications@github.com wrote:

Several downstream systems have a need for lists of annotations, including EPUB [1] and IIIF [2]. For search, we need to have a list of annotations for the result set of applying the query to the set of annotations. Other expressed use cases are user constructed "playlists" of Annotations, curated distribution lists of annotations, and general optimization of annotation retrieval to avoid thousands of HTTP calls for each annotation individually.

As an initial proposal, we could use Activity-Streams's OrderedCollection class[3], which seems to fulfill the (implicit, to be expressed) requirements:

{ "@context": ["http://www.w3.org/ns/activitystreams", "http://www.w3.org/ns/oa"], "@type": "OrderedCollection", "totalItems": 10, "itemsPerPage": 1, "next": "http://example.org/foo?page=2", "self": "http://example.org/foo?page=1", "startIndex": 0, "orderedItems": [ { "@type": "Annotation", "motivation": "commenting", "body": {"value": "I like this!"}, "target": "http://www.cnn.com/" } ] }

I would like to understand what all these properties mean in this respect. But, first of all, I would also like to understand what we need in terms of functionalities before adopting a specification created elsewhere…

For example, the model in EPUB[1] is way simpler and clearer. Why do we need more than that? (Let us forget about the complexities of the RDF expression of lists; a JSON-LD or Turtle representation thereof makes this complexity hidden anyway.)

We could also consider ORE[4] although, I must admit, I do not remember all the details any more, but it could make it simpler. (I admit that using a specification developed in another WG has its advantages, but I would prefer to consider that on technical merit.)

Ivan

[4] http://www.openarchives.org/ore/1.0/vocabulary

This would be consistent with a (to-be-proposed) use of AS2.0 for notifications about annotation activity.

[1] http://www.idpf.org/epub/oa/#h.48f1o3s9o9hf [2] http://iiif.io/api/presentation/2.0/#other-content-resources [3] http://www.w3.org/TR/activitystreams-core/#collections

— Reply to this email directly or view it on GitHub.


Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 ORCID ID: http://orcid.org/0000-0003-0782-2704

tilgovi commented 9 years ago

I'd be curious to know how this compares to any work going on with paging within LDP. The first thing that leapt out to me is that this is the minimum viable paging I am used to seeing from a typical HTTP API: total, count, offset. In this case, it has the addition of next and self for greater navigability by not relying on URL construction using the offset and count.

tilgovi commented 9 years ago

It may be overkill for EPUB but it's spot on for servers, IMO.

azaroth42 commented 9 years ago

I had hoped that the new AS Collection model and ldp Paging would be easily integrated, but with some implementation work there are various issues.

Not least of which being that ldp:contains is not an ordered list, even if it looks like one in JSON-LD. Note the disclaimer in 7.2.1 [1] that:

This says nothing about the ordering of members within any single page.

So if you have 100 annotations on a page, in LDP they're all at the same rank. Conversely, as:OrderedCollection allows the use of an RDF List as the object of as:items, thus preserving in-page order.

So ... my proposal is to drop support for LDP Paging in protocol, and instead use AS Collections / Pages.

[1] http://www.w3.org/TR/ldp-paging/#ldpc-informative

tilgovi commented 9 years ago

I'm not deterred by that note. It sounds like the spec authors have done their best to not overly constrain implementations with potentially difficult requirements.

To me, a server that has a non-deterministic stability of the sorting when doing pagination is unfortunate but also not entirely unreasonable.

In practice, such things are nearly indistinguishable, to clients, from cases where items are being inserted or deleted concurrently. Simply having an offset and a limit does not guarantee that paginating over the whole collection will return each item exactly once. Doing that requires stable collection snapshots that persist for the duration of client sessions and other such complicated stuff.

So, to me, the LDP language is just being realistic and avoiding certain burdens of scale that many find untenable. In practice, it's a very reasonable paging behavior.

azaroth42 commented 9 years ago

No, it's because ldp:contains is the relationship between container and contained-item directly with no rdf:List (or other) involved. So when you ask for the response in turtle, there really is no order at all in the page.

tilgovi commented 9 years ago

"In cases where ordering is important, an LDP Paging server ensures that all the members on any single page have the proper sort order with relation to all members on any next and previous pages."

So, if there's an ordering the spec requires servers to honor it when paginating. Can we discuss use cases? When is it critical that a specific page return an ordered list? Are there any such cases when the client couldn't determine that order themselves if they have reason to not trust the order returned in the serialization?

azaroth42 commented 9 years ago

Right, but only at the page level. The items on a single page can be in any order, the guarantee is that they all sort greater than previous pages and less than next pages. Which if your page size is 1000, and there's only one page, you're out of luck.

As for client side sorting being impossible, how about:

tilgovi commented 9 years ago

I think the last two are compelling. I actually started refuting the first and included a caveat about secret sauce and then I read the rest of your response and found we were thinking the same thing.

Still, I'm having a hard time figuring out whether or how to include this. It's attractive to me that the spec could not require ordering, but if there's a way we can recommend a particular way to signal an ordering when it does exist that would be nice.

azaroth42 commented 9 years ago

I'm thinking at the moment:

So something like:

GET http://example.org/annos/

{
    "@id": "http://example.org/annos/",
    "@type": ["OrderedCollection", "Container"],
    "label": "My Big Collection",
    "totalItems": 42023,
    "contains": ["anno3", "anno2", "anno4", "anno1", "anno5"],
    "first": "http://example.org/annos/?p=0",
    "last": "http://example.org/annos/?p=236"
}

GET http://example.org/annos/?p=0

{
    "@id": "http://example.org/annos/?p=0",
    "@type": "OrderedCollectionPage",
        "partOf": "http://example.org/annos/",
    "next": "http://example.org/annos/?p=1",
    "orderedItems": [
        {
            "@id": "http://example.org/annos/anno1",
            "@type": "Annotation",
            "target": "..."
        },
        "..."
    ]
}
azaroth42 commented 9 years ago

Tracking: The current protocol and this issue would be affected if https://github.com/jasnell/w3c-socialwg-activitystreams/issues/221 is accepted to remove paging from the AS model.

elf-pavlik commented 9 years ago

How based on

GET http://example.org/annos/?p=0

{
    "@id": "http://example.org/annos/?p=0",
    "@type": "OrderedCollectionPage",
        "partOf": "http://example.org/annos/",
    "next": "http://example.org/annos/?p=1",
    "orderedItems": [
        {
            "@id": "http://example.org/annos/anno1",
            "@type": "Annotation",
            "target": "..."
        },
        "..."
    ]
}

Client can assert that http://example.org/annos/anno1 has any kind of relation to http://example.org/annos/? Does it depend on fetching http://example.org/annos/ and it including list of all the items in

"contains": ["anno3", "anno2", "anno4", "anno1", "anno5", ...]

I haven't noticed in AS2.0 any rules for inferencing based on as:partOf

elf-pavlik commented 9 years ago

Following up on my question from https://github.com/jasnell/w3c-socialwg-activitystreams/issues/221#issuecomment-151576958

If we separate the concern of HTTP access to the dataset, let's say provide a single file for download linking to it with void:dataDump, and this way get rid of all the API specific terms from LDP namespace used in web annotation examples. Does it still need paging mechanism from AS2.0, even if we can access the whole dataset directly from device memory?

BigBlueHat commented 9 years ago

@elf-pavlik I don't see ordering, collections, or even paging as an API specific feature--and dearly want Collection style stuff spec'd somewhere or other.

In my current desired use case, I'm wanting to add "static" annotations to a filesystem (think Jekyll sites on GitHub Pages, for example), and would like to include paged collections of them in "blog order" (newest first) as I would with RSS and Atom Feeds. If the single collection file grows unwieldy, I would (reasonably) want to paginate them and/or break them into related collections (by month, week, etc).

Expressing all of that should be possible, is not specific to API usage, and would sure be handy to have defined somewhere. :smile:

Right now, it seems that ActivityStreams 2.0 OrderedCollection and OrderedCollectionPage do come the closest to the above use case requirements.

Given @tilgovi's scalability points (and the likelihood of one or more annotations appearing in one or more pages when using offset or similar), we should plan to accommodate that overlap and the possibility that ordering MAY be ignored if the client is not interested--or plans to resort on some other priority (page position order vs. date order).

A fun problem to be sure. :smiley:

elf-pavlik commented 9 years ago

@BigBlueHat that sounds in direction of https://www.w3.org/Social/track/issues/24

Does each annotation has logical relationship to the whole big collection and you use paging just to send smaller chunks over network. Or each annotation has important logical relationship to a particular page, not directly to the whole collection, and even while having all the dataset loaded in memory you still want to make sure to preserve this exact page structure?

BigBlueHat commented 9 years ago

@elf-pavlik yeah. It's not at all dissimilar. :smiley:

You can think of it a bit like "rolling log files." The importance of the page they are on only depends on how you're doing paging. Granted, semantically--in the log file case--the pages are less pages than sub-collections...however, other than directionality between the pages there's already minimal difference between a Collection and a Page (both in general and in the AS2 vocab specifically).

Even if the whole annotation collection were loaded into memory, you may still want them paginated for display--even if you're merely paginating on the total length of the collection divided by # of items per page.

Does that explain the value of paging in a static site style use case?

elf-pavlik commented 9 years ago

Even if the whole annotation collection were loaded into memory, you may still want them paginated for display--even if you're merely paginating on the total length of the collection divided by # of items per page.

I try to keep distinction between data model, API or UI

As I understand each Annotation has logical relation with a List of annotations (in this case ordered) and pages have no other purpose than split this list into small chunks when accessed over network http://w3c.github.io/web-annotation/protocol/wd/paging.html#multiple-ordered-responses-with-annotations

I see this question possibly helps with clarifying it:

If we add more annotations to the list, can they possibly move between pages? Or annotation has permanent relationship with a page which can only change by intentional operation: "Move this annotation from this page to this page" and once again not as side effect of adding annotations to the whole list. Actually in such case, one should never insert annotations directly to the list but always interact only with pages and use the List as collection of pages, not collection of annotations! Pages become here collections of annotations.

BigBlueHat commented 9 years ago

@elf-pavlik good points, and very good question for re-framing the discussion!

It's sort of what I was after when I was referring to Pages "merely" as Collections with some siblings and a position among them--such that "next" and "previous" could be used for finding the related siblings.

In LDP paging the relationships are a bit clearer (maybe). In that case, a "page" has a greater than / less than relationship with other pages, but the items within it do not themselves have order--as contains is unordered. Instead, you're supposed to provide additional ordering constructs specific to you're domain:

It is up to the domain model and server to determine the appropriate predicate to indicate the resource’s order within a page (or globally), and up to the client receiving this representation to use that order in whatever way is appropriate to meet its needs, for example to sort the data prior to presentation on a user interface.

@azaroth42 iirc, there was some reason we felt that non-in-page ordering done this way was insufficient and that knowing that position of the annotation within the wider collection--without the provision of additional statements was important...but I don't honestly recall what that was now. Maybe you do? :smile:

Schema.org also lacks the notion of "pages" per se, and instead uses ItemList in which a ListItem can reference a next and previous sibling as well as it's position within the list--such that you could build a list of lists. :turtle:'s all the way down. :smile_cat:

azaroth42 commented 8 years ago

Dependency on #92. It doesn't make sense to me to have both oa:List and as:OrderedCollection in the model when the structure is identical and the semantics so close as to make no difference.

iherman commented 8 years ago

Closed by resolution

we close issues #50, #92, #145 with the principle that, whenever we can, we use ordered list only. Exceptions should be subjects of specific issues.' at telco 2016-02-12.

See: http://www.w3.org/2016/02/12-annotation-irc#T16-54-02

elf-pavlik commented 8 years ago

Just to double check my interpretation of as:OrderedCollection and as:OrderedCollectionPage which I understand Web Annotations will use.

https://www.w3.org/TR/activitystreams-core/#collections

Each page uses as:partOf to reference collection which got 'broken into pages'. Each page also uses as:items to reference each item/member of the particular OrderedCollectionPage. In that case each item/member DOES NOT have a relationship with the whole OrderedCollection which one can express with a single predicate/property defined in AS 2.0 Vocabulary (or its owl:inverseOf) . Expressing relationship between the instance of as:PagedCollection and each item/member seems to require combining two properties: (owl:inverseOf) as:partOf and as:items. In such case, maybe it would make sense to define owl:propertyChainAxiom. In other words, as:(Ordered)Collection has either items/members or pages, but when it has pages it doesn't have any item/member directly, than only each page has item/member directly.

azaroth42 commented 8 years ago

@elf-pavlik Yes. I'm working on writing it up for the annotation model, vocab and protocol this week. And, to avoid multiple possible responses, the consensus was that we would only use the paged model, as annotation collections are more likely to be very large than very small.