ponder-sh / ponder

A backend framework for crypto apps
https://ponder.sh
MIT License
614 stars 95 forks source link

[Feature] Pagination for large data sets #329

Closed Willyham closed 8 months ago

Willyham commented 1 year ago

We are building a system which indexes deposit events on the beacon chain from specific contracts. This will easily grow into the 10-50k+ events range. Ponder has a limit of 5000 for the skip parameter.

Is there any way to paginate through this data? Forgive the ignorance about the underlying system but this doesn't seem like too much of a stretch for the underlying postgres database if we could query it in batches of 1000.

0xOlias commented 1 year ago

This is very reasonable (the limit/offset pagination was inherited from subgraphs, but no reason keep it around). The purpose of the 1000/5000 validation is to discourage using limit/offset pagination for large result sets, which can be very bad for performance.

We should probably support cursor pagination for this purpose, which would be more performant. Would that work for your use case?

I think it'd be a relatively light lift to add cursor pagination to the findMany store method that backs the plural GraphQL fields.

Willyham commented 1 year ago

Cursor pagination would be perfect. We have hacked our way around the issue with a crude implementation of that from the client side:

query GetFoos($upToBlock: Int!, $previousCursor: Int!, $perBatch: Int!) {
  foos(first: $perBatch, where: { block_lte: $upToBlock, block_gte: $previousCursor }, orderBy: "block", orderDirection: "asc") {
    id
    block
    foo
  }
}
0xOlias commented 1 year ago

@eliobricenov and I chatted through a design for this. We would need to introduce a breaking change to replace the current first/skip pagination with cursor-based pagination.

Before

{
  transferEvents(skip: 10, first: 5) {
    id
    amount
  }
}

After

{
  transferEvents(after: "ACX4YS=", limit: 5) {
    items {
      id
      amount
    }
    before
    after
  }
}

In the new design, all top-level plural GraphQL types will return a Page object rather than the raw list. The Page object will look like:

type TransferEventPage {
  before: String
  after: String
  totalCount: Int! # Maybe include this?
  items: [TransferEvent!]!
}

where the items field includes the original/raw list that Ponder currently returns.

Details

For cursor pagination to work, the underlying dataset must have a consistent sort order enforced. We can do this by relying on the unique id field that all Ponder entities already have.

If the user does not specify an orderBy/orderDirection, a default of orderBy: "id", orderDirection: "asc" will be applied (this is already the case).

If the user specifies an orderBy/orderDirection, it's possible for that field to contain duplicates, which violates the consistent sort order requirement. To resolve, we automatically include a fallback/tiebreaker orderBy: "id".

shrugs commented 1 year ago

looks great and standard to me! afaik only benefit of offset-based pagination is that users can jump directly to a specific page, but i'm struggling to see that as a hard requirement within a crypto frontend.

could allow backwards compat by adding a different top level field name but that's less satisfying. worth polling the telegram group to see if anyone wouldn't be able to rewrite their queries to use cursor-based pagination?

Destiner commented 11 months ago

Note that there is an alternative, backward-compatible way to implement cursor-based pagination, as noted in TheGraph docs:

query manyTokens($lastID: String) {
  tokens(first: 1000, where: { id_gt: $lastID }) {
    id
    owner
  }
}

I believe this is already possible in Ponder if you define the id as Int or BigInt. Would be great to also have support for _gt, _gte, etc suffixes for String.