ukaea / fair-mast

A data management system for Tokamak data
MIT License
3 stars 2 forks source link

Fix slow pagination issue #1

Closed samueljackson92 closed 5 months ago

samueljackson92 commented 6 months ago

When accessing the list of signals (https://mastapp.site/json/signals), which contains a very large number of records, it is noticeably slow to load. I have traced the source of this slow loading to how we are currently doing pagination.

Currently, we are paginating using offset pagination where we have to count the number of items in the table for every query. The slow line of code is line 177 in the snippet below:

https://github.com/ukaea/fair-mast/blob/ea97ab5249ab0f0ac59bdc42ae59d6a766e53a54/src/api/crud.py#L173-L179

I think the correct solution is to swap to using cursor based pagination. See the difference here. For most tables we should be able to use the UUID as the cursor. For shots we might use the shot_id, although for consistency it might be nicer to give shots UUIDs as well.

jameshod5 commented 6 months ago

Some variation of cursor pagination going on within james/cursor_pagination branch. New parameter for 'cursor' has replaces the parameter 'page'. Retrieves next cursor for header and can retrieve previous cursor as well. Edge case for if previous cursor does not exist (i.e. the first set of results) included.

Next stage is to add tests and include any other edge cases

jameshod5 commented 6 months ago

@samueljackson92 seeing as aggregate queries might not have UUID/shot_id involved in the results, is it worth keeping aggregate queries to offset pagination?

samueljackson92 commented 6 months ago

Good question. I think we should be consistent as much as we can. I think maybe the best solution would be to return a dynamic cursor UUID generated from whatever key was used for groupby. You can do something like:

def get_uuid(groupby: t.Any) -> str:
    return str(uuid.uuid5(uuid.NAMESPACE_OID, str(groupby)))