solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
494 stars 44 forks source link

Resource paging #230

Open csarven opened 3 years ago

csarven commented 3 years ago

Use cases:

Existing specifications:

WIP proposals:

TallTed commented 3 years ago

SeeAlso: Scrollable Cursors, which manifest in ODBC, JDBC, and other SQL-style protocols/standards

acoburn commented 3 years ago

Possibly worth noting that the W3C Web Annotations Protocol makes use of Activity Streams paging for container representations

csarven commented 3 years ago

Possibly also worth noting that ActivityPub makes use of AS2's paging for Collections: https://www.w3.org/TR/activitypub/#collections .

It is a good bonus if we can ensure that Solid servers can conform to the server parts of WAP and AP.


Some time ago I've implemented AS2's Collections Paging in https://github.com/csarven/mayktso (server) and https://git.dokie.li/ (application). IIRC, I found AS2's paging easier to wrap my mind around and to implement. LDP Paging's use of 303 may have been one of the things that I didn't want to get into either. But I did like LDP Paging's use of HTTP headers for navigating page sequences over AS2's in data. Also found AS2's OrderedCollections quite useful - something that LDP with ldp:contains didn't offer out of the box.


For example, the following container (served from mayktso) has pagination:

curl -H'Accept: text/turtle' https://linkedresearch.org/inbox/linkedresearch.org/cloud/

The representation in Turtle includes a bit of LDP (eg. ldp:BasicContainer) and also uses the AS2 vocab (eg. as:first) because the resource is intended to paginate.

Using profile negotiation:

curl -H'Accept: application/ld+json; profile="https://www.w3.org/ns/activitystreams"' https://linkedresearch.org/inbox/linkedresearch.org/cloud/

The representation in JSON-LD uses AS2 + Collection paging (without LDP vocab).


A proper diff between LDP and AS2's paging would be possible, but much has already been discussed in https://github.com/w3c/web-annotation and https://github.com/w3c/activitystreams.

What features do we need and which specs would be a good fit?


At the moment, a Container representation (without pagination) could look something like this (assuming that agent can Read the contained resources):

/foo/
# server-managed
  a ldp:BasicContainer ;
  ldp:contains /foo/bar , /foo/baz ;

# user-managed
  dcterms:title "foo" .

# server-managed
/foo/bar
  a ldp:Resource ;
  posix:size 12 ;
  posix:mtime 1612699698.963 .

With pagination, the Container representation could look like this:

/foo/
# server-managed
 a ldp:Container , as:Collection ;
 as:first /foo/?p=1 ;

# user-managed
  dcterms:title "foo" .

Paged resource:

/foo/?p=1
# server-managed
  a as:CollectionPage ;
  as:items /foo/bar , /foo/baz .

# server-managed
/foo/bar
  a ldp:Resource , as:Object ;
  posix:size 12 ;
  posix:mtime 1612699698.963 ;
  dcterms:creator https://csarven.ca/#i ;
  dcterms:title "foo bar" .

There may be finer details that I'm skipping right now but what introducing pagination does is moves the server-managed triples (like containment statements, whether it is ldp:contains or as:items is besides the point right now) into its own resource eg. /foo/?p=1.

I'd expect paged resources eg. /foo/?p=1 (as:CollectionPage) to be only server-managed. There is no strong use case that I can think of right now that would need the user to update paged resources. So, it should at most have Read permissions for agent. Item order should be managed by the server - I don't see why user needs to touch that in individual pages.

Having said that, while we move the containment triples out of the Container representation and into its own resource, we introduce other server-managed information as per pagination (eg. as:first, as:last, as:totalItems etc) into the Container.

So, with AS's paging, the Container is still not completely freed-up as there is server-managed and user-managed content. Contrasting this with LDP's paging, the HTTP headers to navigate the pages is preferable in that server controls them but can leave the body entirely up to the user. I just don't like LDP's 303 (but I understand why they did it that way). Is there a way out? Would it make sense to consider using AS2's Paging approach in the Link header instead?

damooo commented 3 years ago

Is there any consensus reached on pagination model? It seems very essential when we have large LDP-RSs.

kjetilk commented 3 years ago

Another way of looking at this, and I'm not saying we should, as there are two quite mature ways already, is to consider an approach based on materialized iterators and RFC7233 Ranges.

One key thing that both AS and LDP Paging does is to establish a sort order, which is necessary since otherwise, you might get overlapping pages, which would wreak havoc.

However, as RDF consists of a set of statements and is thus unordered, this does sit that well. Also, the graph may change between requests for pages.

That's why I started thinking in terms of a materialized iterator instead. It is a very loose idea at the moment, but you could imagine that upon the first range request, an iterator is materialized in no particular sort order, and its state is identified by a certain ETag. The server would then need to keep this state for a while in case further range requests are submitted (which gives predictable pages at the cost that materialization may have become stale). We'd need a new triples range unit to go with it.

Very loose idea, lots of stuff that would need to be worked out, perhaps the materialization would need its own URI, perhaps we could use cache headers to control the lifespan of the materialization, perhaps If-Match could be used to validate the materialization using the same etag. Stuff like that. Just wanted to dump my brain. :-)

acoburn commented 3 years ago

A while ago, I spent some time thinking through a similar set of issues as @kjetilk and came to a remarkably similar conclusion. Effectively, if a given page-able request is materialized in a request-scoped context, and the client can then page through that context, you can address the RDF-specific issues with range requests. There are clearly considerations for the lifespan of that view and how it is addressed, but those seem like tractable problems.

pietercolpaert commented 3 years ago

As an alternative to LDP/AS2/Hydra pagination, we’ve designed the TREE hypermedia specification at https://w3id.org/tree/specification. It allows to qualify a relation from one page to one or multiple next pages (and therefore create a search tree, hence the name). We’re working on compatibility with LDP, Shape Trees, AS2 and Hydra so you do not need to overthrow your current collection design to make it work.

If resource paging is added in Solid, I think TREE should certainly be considered to (i) not be contrained to one-dimensional pagination, (ii) to have a specification that brings together any kind of collection design used within the Solid ecosystem, and (iii) to allow clients to understand what the “next page” precisely means and allow it to know whether it should still request that page at all.

Example

/foo/
# server-managed
 a ldp:Container ;
 tree:view /foo/?p=1 ;

# user-managed
  dcterms:title "foo" .

Paged resource:


/foo/?p=1
# server-managed
  a as:CollectionPage ;
  as:items /foo/bar , /foo/baz ;
## This relation links to a next page of items that were modified later in time than first of January 2021
  tree:relation [ 
     a tree:GreaterThanRelation ;
     tree:node </foo/?p=2> ;
     tree:path dcterms:modified;
     tree:value "2021-01-01"^^xsd:date 
  ] .

# server-managed
/foo/bar
  a ldp:Resource , as:Object ;
  posix:size 12 ;
  posix:mtime 1612699698.963 ;
  dcterms:creator https://csarven.ca/#i ;
  dcterms:title "foo bar" .
michielbdejong commented 20 hours ago

See also how TrinPod does paging.

Are there any Solid storage implementations that support paging, apart from TrinPod?