Paginated atom feed - Githubissues

pschirmacher commented 9 years ago

This would be nice to have: http://tools.ietf.org/html/rfc5005#section-3. It would allow feed consumers to access old entries (e.g. after downtime or similar).

aheusingfeld commented 9 years ago

Hm, couldn't that already be done with the prev link we implemented with #116?

mvitz commented 9 years ago

No. #116 added links for individual updates. This will add links for the complete feed. Because I think we all agree to support this I will take care of this issue.

pschirmacher commented 9 years ago

One thing that might have to be considered is how to handle new feed entries while a client is browsing through the paginated feed. Example:

There are 30 statuses in the database (IDs from large/latest to small/oldest): [30 .. 1]
A client accesses the first page of the feed and is presented with the latest 25 entries: [30 .. 6]
A new status is added to the database: [31 .. 1]
The client follows the 'previous' link. Which statuses is he going to get? [5 .. 1] or [6 .. 1]?

Another example:

Again 30 statuses in database: [30 .. 1]
A client accesses the first page of the feed: [30 .. 6]
100 new statuses are added to the database: [130 .. 1]
Will the 'previous' link now point to [105 .. 81] or to [5 .. 1]?

mvitz commented 9 years ago

I had this discussion previously. Is there any other solution than passing the first (or last item) processed in the current chunk? With this information the server could solve some of your problematic use cases.

pschirmacher commented 9 years ago

passing the first (or last item) processed in the current chunk

I'm not sure I understand what you mean by that. Do you mean adding more information to the 'previous' link so that the server can figure out "from where to start"? Example:

30 statuses in database
GET /feed

feed with entries 30 to 6 and 'previous' link /feed?from=5?

tloist commented 9 years ago

@pschirmacher That's a ConcurrentModificationException, you've described there.

A quick search through the HTTP Status Codes let's me assume that this would be an 409 - Conflict. This would require that the server knows that the resource was changed in the mean time and makes browsing through paginated feeds more or less a pain, so I would hope that the resource is read far more often than edited... (which may be wrong).

mvitz commented 9 years ago

@pschirmacher: Yes that was the idea I had.

In general the problem described here is hard to solve without taking a snapshot of the data for a complete session from a client. And this is something nobody want's to do (at least in this case).

aheusingfeld commented 9 years ago

That's a ConcurrentModificationException

@tloist It's not! That would only happen if the server held the state of the list/ cursor, the client is currently iterating over. I seriously hope that none of us would implement it this way.

feed with entries 30 to 6 and 'previous' link /feed?from=5

@pschirmacher That makes perfect sense to me. Optionally we should add the number/ count of items to fetch. In this case the server could simply get the list of all the "updates", find the item with id=5 and return the following count of "updates"

mvitz commented 9 years ago

Only problem with this approach is calculating the last link because the from parameter points to the newest item in the current chunk.

mvitz commented 9 years ago

Hm the RFC states:

Paged feeds are lossy; that is, it is not possible to guarantee that
clients will be able to reconstruct the contents of the logical feed
at a particular time.  Entries may be added or changed as the pages
of the feed are accessed, without the client becoming aware of them.

Therefore, clients SHOULD NOT present paged feeds as coherent or
complete, or make assumptions to that effect.

So at least the simplest possible solution would satisfy the RFC...

tloist commented 9 years ago

@aheusingfeld That's the same scenario - that's what I meant. And my suggestion involved that the client remembers the state of the resource it previously accessed.

Actually @mvitz introduced the idea of a snapshot.

But in this case I don't get the problem because the list of feeds is an append only list. So if you reference them relative from the beginning the same index always points to the same element. That's basical what you do when you create an URL like /feed?from=5.

@mvitz The RFC assumes that the resource has arbitrary modifications (e.g. deletion of a feed) which is not (is it?) the case here.

aheusingfeld commented 9 years ago

Only problem with this approach is calculating the last link

Not a problem if the "rel=next" points to a url containing before=<first-post-on-current-page>, is it?

mvitz commented 9 years ago

@tloist You can delete your own updates as long as there is no reply to it and the RFC assumes that entries are created and deleted while some client is browsing the paginated atom feed. This can happen with statuses.

@aheusingfeld If there are 20 entries [20..1] and we are currently viewing 6-10 the following links must be generated: rel=first: No problem points to the latest one which has no pagination information rel=next: No problem the 5 entries before 6 -> [5..1] rel=prev: We could introduce something like the 5 entries after 10 -> [11..15] rel=last: We need to now the first entry ID (1) or the the last ID of the last page (6). I think this would be complicated.

tloist commented 9 years ago

Okay, say, I want to get all the hot new entries beginning from where I left of (e.g. I know the last entry I read had ID: 4711).

The discussion here tells me that I can't use the paginated feed (which is the default), right? So how do I do this instead?

pschirmacher commented 9 years ago

@tloist

ConcurrentModificationException This exception may be thrown by methods that have detected concurrent modification of an object when such modification is not permissible.

I think it should be permissible here.

409 The request could not be completed due to a conflict with the current state of the resource.

I don't think that fits here. The client might send an ETag to indicate what status of the resource it's expecting, and if the status of the resource changed, the server might respond with 412. I don't think this is a good approach here, though, because there might be too many status updates.

pschirmacher commented 9 years ago

Maybe deleting entries from the feed can be avoided? E.g. by doing a soft delete in the database and just removing the content from the feed entry? Another option might be making the events actually immutable and publishing StatusUpdated or StatusDeleted events. But I guess we don't want to do that here.

mvitz commented 9 years ago

Hm. In my mind every deleted entry (even if done with soft deletion) should not appear in the feed.

aheusingfeld commented 9 years ago

Ok, let's get the semantics clear first of all. The hard part is that "rel=prev" and "rel=next" have the semantics of e.g. the browser back button and not of time. Which is because we "travel back in time" when paging through the entries.

But my point is that when the server generates the page, he always knows the ids of the 20 entries he returns and the next older entry and then next younger entry! Therefore the client doesn't need to guess!

For the following I assume that the default count of entries in a list is 20:

rel=first -> get youngest entry

Returns a list of entries starting with the latest, freshest dp-entry and the 20 entries which happened before that latest entry Example: /updates?count=20

rel=next -> get older entries

Returns a list of entries starting with the db-entry which comes directly after the oldest entry in the current list and the 19(1) entries which happened before that entry. Example: /updates?before=180&count=20 (where 180 is the id of the first entry on the "next page")

rel=prev -> get younger entries

Returns a list of entries ending with the db-entry which comes directly before the youngest entry in the current list and the 19(!) entries which happened after that entry. Example: /updates?after=200&count=20 (where 200 is the id of the last entry on the "prev page")

rel=last -> get oldest entry

Returns a list of entries starting with the db-entry which comes directly after the entry with id=1 in the current list and the 19 entries which happened before that entry. Example: /updates?after=1&count=20

Does that make sense? With this solution we have a very simple id >= $reference or id <= $reference for the dabase query.

mvitz commented 9 years ago

:+1: Only thing I would not agree with is that rel=last starts with id=1 but we can agree saying rel=last starts with ID of first entry ;-)

aheusingfeld commented 9 years ago

@mvitz It doesn't matter which id it is or whether that id exists - it just needs to be the smallest id in the db as our db query is id >= 1!

mvitz commented 9 years ago

@aheusingfeld OK! >= 0 in case of statuses ;-)

innoq / statuses

Paginated atom feed #147

rel=first -> get youngest entry

rel=next -> get older entries

rel=prev -> get younger entries

rel=last -> get oldest entry