nostr-protocol / nips

Nostr Implementation Possibilities
2.32k stars 563 forks source link

Paging with REQ filters #620

Open mstoecklein opened 1 year ago

mstoecklein commented 1 year ago

It should be possible to create a REQ filter that can be used to paginate notes independently of the timestamp. I suggest an offset filter in conjunction with the limit filter:

Example

{
  "search": "...",
  "limit": 10,
  "offset": 10 // page 2
}
fiatjaf commented 1 year ago

These proposals have been extensively discussed here in the past.

ryzizub commented 1 year ago

I think @hoytech is working on something

hoytech commented 1 year ago

I have thought a little bit before about cursors and offsets, but I'm not currently working on anything. For something like this I'd want it to be specified as a NIP before implementing it, and I'm not sure there's an appetite for that at this time. Paging by timestamp seems good enough for most clients.

fiatjaf commented 1 year ago

Clients should either store everything locally so they can paginate or they should accept the chaos and that they will not always get everything in the most perfect pristine absolute order.

ryzizub commented 1 year ago

Clients should either store everything locally so they can paginate or they should accept the chaos and that they will not always get everything in the most perfect pristine absolute order.

They should do that, and many clients are actually doing so right now. However, when communicating with multiple relays and requesting the same set of data, it would be excellent to have the ability to specify the subset of data that you already possess.

The timestamp pagination is good but not enough. As @hoytech saying, thats definitely talk for another NIP

alexgleason commented 1 year ago

Having good pagination sounds like it would fix all my problems with the home feed rn.

IngwiePhoenix commented 1 year ago

Been wondering how best to do pagination for a while. Would honestly love a formal specification for this.

alexgleason commented 1 year ago

The way I'm currently doing it is with since, until, and limit filters.

  1. Decide how many items you want per-page, eg 20.
  2. Make a REQ with "limit": 20. Get the events, then sort them descending by timestamp.
  3. The until value for your next filter is the created_at timestamp of the oldest event in your collection.
  4. Repeat.

I'm doing a bunch of crazy Mastodon stuff in this repo which might not make sense, but nevertheless my pagination code is here: https://gitlab.com/soapbox-pub/ditto/-/blob/7c2de9b2cf72a61efdac1c8f19418b377308612d/src/utils/web.ts#L66-106

And then here's how I use it to create a paginated home feed: https://gitlab.com/soapbox-pub/ditto/-/blob/7c2de9b2cf72a61efdac1c8f19418b377308612d/src/controllers/api/timelines.ts#L12-17

It's still not perfect. I end up creating missing gaps sometimes. I'm planning to fix that by querying a few more events than I really need and chopping them off between steps 2 and 3.

alexgleason commented 1 year ago

Using an offset filter doesn't make sense because new events could come in between the first call and the second. With offset being independent of your application state, you'll end up getting the same events over and over.

Paginating by timestamp makes the most sense, but since the timestamp is low precision (second rather than ms), you might need to subtract 1 from it while paginating.

alexgleason commented 1 year ago

Something I find myself wanting a lot is an "event ID + timestamp" identifier for pagination. The dumb way would be like ${lpad(event.created_at, '0', 10)}${event.id}, eg:

01693595340a2c2fe7c59c0d83d8a2ecd13f911550621177f389170a65c0cbe9ce4cb62de19

The first 10 characters are the timestamp, so just chop them off to get the event ID.

Benefits:

I kind of want this to be a type of nip19 ID, but I don't think it's possible to preserve the "sorting alphabetically to sort chronologically" feature with the way nip19 encoding works.

Semisol commented 1 year ago

Here's an easy way to do pagination:

  1. If you just started, do an initial query with no time constraints and a limit.
  2. When you need more items, create another query with until being the smallest timestamp you saw from that relay for that query plus one.
  3. If your minimum timestamp doesn't change between two queries, you have two options:
    • Try with a larger limit.
    • Decrement your minimum timestamp by one to skip all events with that timestamp.
staab commented 1 year ago

If you just started, do an initial query with no time constraints and a limit.

This doesn't work because some relay implementations (strfry?) don't return events in order.

fiatjaf commented 1 year ago

The first version of strfry didn't, but it was a bug. Now it does. Some relays are still running old versions though.