panoratech / Panora

Add an integration catalog to your SaaS product in minutes
https://docs.panora.dev
Apache License 2.0
383 stars 76 forks source link

feat: Cursor Based Pagination #430

Closed rflihxyz closed 4 weeks ago

rflihxyz commented 1 month ago

For now, our favourite approach is Cursor based pagination, as the underlying data changes frequently. Other approaches like offset-based paginations would frequently lead to double entries.

Scope: All unified endpoints

Open to community suggestions :)

rajeshj11 commented 1 month ago

@rflihxyz Below are the reasons why I picked keyset-based pagination(cursor-based). please refer to this link: https://docs.gitlab.com/ee/api/rest/index.html#pagination



Offset-based pagination. The default method and available on all endpoints except, in GitLab 16.5 and later, the users endpoint.
Keyset-based pagination. Added to selected endpoints but being [progressively rolled out](https://gitlab.com/groups/gitlab-org/-/epics/2039).
note:
**For large collections, you should use keyset pagination (when available) instead of offset pagination, for performance reasons.**
mit-27 commented 1 month ago

I think @rflihxyz is talking about the pagination that is going to be implemented at services that handle the unified API endpoints. So, when we create the connection with any provider, it triggers the initial sync. The sync uses the third-party provider's pagination (It might be possible that each third party has a different way of handling pagination) to fetch a certain amount of data and store it in Panora's DB. Now, The services that handle the unified endpoint use cursor pagination and fetch data from Panora's DB.

Here, it is also possible that the user has a large amount of data in the connected provider. As a reason, a certain amount of data is stored in Panora's DB after initial sync. The remaining data can be fetched on-demand using the unified API endpoint.