Support scalable pagination

Arachnid commented 5 years ago

Presently, it's possible to query entities using a where clause, but this uses offsets from start or end, which likely won't scale well if paging over a large dataset. It'd be good to use the graphql connection pattern, or something similar, where result sets return an opaque cursor that can be passed in on subsequent calls to pick up where the previous query left off.

leoyvens commented 5 years ago

Settings offsets is not the most convenient API and we need good pagination support, connections seem like a good model to follow.

offsets from start or end, which likely won't scale well if paging over a large dataset

Could you elaborate on what's the issue you're envisioning here?

Arachnid commented 5 years ago

Could you elaborate on what's the issue you're envisioning here?

In most database systems, a query like SELECT * FROM table LIMIT x OFFSET y involves the database internally iterating over and discarding the first y results. This results in the cost of paginating over a large dataset being O(n^2) instead of O(n). Using cursors, in contrast, doesn't suffer from this issue.

leoyvens commented 5 years ago

@Arachnid I see. Though that seems to be more a concern of implementation than of graphql interface. We could do a good implementation of graphql offsets that doesn't use OFFSET, and it's also possible to do a bad implementation of cursors that does use OFFSET on the DB.

Arachnid commented 5 years ago

True, but I don't think it's possible (at least without low level DB support) to do a good implementation that uses offsets - so better to fix the API early.

-Nick

On Tue, 18 Dec 2018, 05:33 Leonardo Yvens, notifications@github.com wrote:

@Arachnid https://github.com/Arachnid I see. Though that seems to be more a concern of implementation than of graphql interface. We could do a good implementation of graphql offsets that doesn't use OFFSET, and it's also possible to do a bad implementation of cursors that does use OFFSET on the DB.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/graphprotocol/graph-node/issues/613#issuecomment-447908816, or mute the thread https://github.com/notifications/unsubscribe-auth/AABFyUH_o306Ed2dlvxJrf5RH4Xq-LIVks5u58dHgaJpZM4Y43Ry .

fubhy commented 5 years ago

I wrote a little utility hook that takes care of automatically scraping the endpoint for more results (using skip & limit parameters) until it's exhausted:

import { useQuery } from '@apollo/react-hooks';
import { useRef, useEffect } from 'react';
import { DocumentNode } from 'graphql';

type QueryPair = [DocumentNode, DocumentNode];
type ProceedOrNotFn = (result: any, expected: number) => boolean;

export function useScrapingQuery([query, more]: QueryPair, proceed: ProceedOrNotFn, props?: any) {
  const limit = (props.variables && props.variables.limit) || 100;
  const skip = useRef((props.variables && props.variables.skip) || 0);
  const result = useQuery(query, {
    ...props,
    variables: {
      ...(props && props.variables),
      limit,
      skip,
    },
  });

  useEffect(() => {
    if (!!result.loading || !!result.error || !proceed(result.data, skip.current + limit)) {
      return;
    }

    result.fetchMore({
      query: more,
      variables: {
        ...result.variables,
        skip: skip.current + limit,
      },
      updateQuery: (previous, options) => {
        skip.current = skip.current + limit;

        const moreResult = options.fetchMoreResult;
        const output = Object.keys(moreResult).reduce(
          (carry, current) => ({
            ...carry,
            [current]: carry[current].concat(moreResult[current] || []),
          }),
          previous,
        );

        return output;
      },
    });
  }, [result, skip.current]);

  return result;
}

Basically, you pass a query tuple (first query mandatory, second is optional to provide a custom query for the "fetch more" logic (e.g. if the first query has other, non-paginated fields in it).

Example:

import gql from 'graphql-tag';

export const FundOverviewQuery = gql`
  query FundOverviewQuery($limit: Int!) {
    funds(orderBy: name, first: $limit) {
      id
      name
      gav
      grossSharePrice
      isShutdown
      creationTime
    }

    nonPaginatedQueryField(orderBy: timestamp) {
      ...
    }
  }
`;

export const FundOverviewContinueQuery = gql`
  query FundOverviewContinueQuery($limit: Int!, $skip: Int!) {
    funds(orderBy: name, first: $limit, skip: $skip) {
      id
      name
      gav
      grossSharePrice
      isShutdown
      creationTime
    }
  }
`;

It uses the "limit" and "skip" query variables. The hook automatically adds these by default.

Additionally, you need to provide a callback that checks if more needs to be fetched after each cycle.

Full usage example:

const FundList: React.FunctionComponent<FundListProps> = props => {
  const proceed = (current: any, expected: number) => {
    if (current.funds && current.funds.length === expected) {
      return true;
    }

    return false;
  };

  const result = useScrapingQuery([FundOverviewQuery, FundOverviewScrapingQuery], proceed, {
    ssr: false,
  });

  return <div>{...}</div>; // Render full fund list (keeps adding more items until the resource is exhausted.
}

softwaredev927 commented 2 years ago

I also have same problem in our project. If we don't use any where clause, we can simply save total count in a schema and use that, but we are using complex where clause and it's impossible to save all count of items filtered by each queries.

I want to request a feature that you can provide in the following way.

// assume I have entity like this type Token @entity { ID String! price BigInt! }

// then we can query like this query { tokens(where: {price_gt:"30"}) { ID } }

// in this case can we use like this? query { countOf: tokens(where: {price_gt:"30"}) { count } tokens(where: {price_gt:"30"}, first:1000) { ID } } // if we use special alias like "countOf", can you return one entity that has field count?

I think it's not too difficult to add this feature in your dev team. If you guys don't have time, I can work with you to add this feature. Thanks

dotansimha commented 2 years ago

Just adding my thoughts here.

Today, pagination is implemented on the root of every Query type, and returns a ListType of an entity.

We can implement Cursor-based pagination (see spec here https://relay.dev/graphql/connections.htm). It's supported in all popular clients, and makes pagination super easy and robust (since it's cursor based, so it's easier to get a reliable response, instead of using skip).

We can expose a Connection type on the root Query, without changing the existing - the new field can co-exists with the current API without breaking changes.

Here's an example:

type Query {
  purpose(id: ID): Purpose!
  purposes(filter: PurposeFilter): [Purpose!]!
  purposeConnection(filter: PurposeFilter, paginate: PaginationFilter): PurposeConnection!
}

input PaginationFilter {
  before: String
  after: String
  first: Int
  last: Int
}

type Purpose { ... }

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}

type PurposeEdge {
  node: Purpose
  cursor: String!
}

type PurposeConnection {
  pageInfo: PageInfo!
  edges: [PurposeEdge]!
}

montanaflynn commented 2 years ago

Having count aggregation would be very useful for pagination and displaying information in UIs. For example when filtering with where you could also include count aggregate with the same conditions and then have a page UI something like:

Found 49 tokens

(show first 10 tokens)

[1] [2] [3] [4]

0xJem commented 1 year ago

Is this still being worked on? Pagination with lots of historical data is a huge pain, and applying offsets really, really doesn't scale.

eldimious commented 6 months ago

I think it would be useful to add counter / cursor as pagination. Any idea if this feature will be supported?

graphprotocol / graph-node

Support scalable pagination #613