uriyyo / fastapi-pagination

FastAPI pagination 📖
https://uriyyo-fastapi-pagination.netlify.app/
MIT License
1.1k stars 126 forks source link

`total` returning wrong number of values #1187

Closed chichi13 closed 1 week ago

chichi13 commented 3 weeks ago

I use FastAPI and SQLAlchemy for my project. I have an endpoint that requires a lot of joins with a WHERE:

@router.get(
    "/{entity_id}/related_entities",
    status_code=status.HTTP_200_OK,
    response_model=Page[EntityWithEvents],
    description="Get all related entities for a specific entity",
)
async def get_specific_entity_related_entities(
    entity_id: int,
    db: Annotated[AsyncSession, Depends(get_db)],
) -> AbstractPage:
    return await paginate(
        db,
        select(Entity)
        .where(Entity.id == entity_id)
        .options(
            selectinload(Entity.related_entities).joinedload(RelatedEntity.category),
            selectinload(Entity.related_entities).joinedload(RelatedEntity.type),
            selectinload(Entity.related_entities).joinedload(RelatedEntity.language),
            selectinload(Entity.related_entities).joinedload(RelatedEntity.account),
        )
    )

The total result should be ~48 in my case. However, since I'm filtering on a specific entity_id, fastapi_pagination returns a total of 1, a single page despite a size of 20:

image

I should point out that the result of the main query is fine. Is there a solution to my problem?

uriyyo commented 3 weeks ago

Hi @chichi13,

It's correct behavior.

It should be possible to override total value, it will look smth like this:

from fastapi_pagination import Page, paginate
from fastapi_pagination.customization import CustomizedPage, UseIncludeTotal
from fastapi_pagination.ext.sqlalchemy import create_count_query

@router.get(
    "/{entity_id}/related_entities",
    status_code=status.HTTP_200_OK,
    response_model=CustomizedPage[Page[EntityWithEvents], UseIncludeTotal(False)],
    description="Get all related entities for a specific entity",
)
async def get_specific_entity_related_entities(
        entity_id: int,
        db: Annotated[AsyncSession, Depends(get_db)],
) -> AbstractPage:
    query = (
        select(Entity)
        .options(
            selectinload(Entity.related_entities).joinedload(RelatedEntity.category),
            selectinload(Entity.related_entities).joinedload(RelatedEntity.type),
            selectinload(Entity.related_entities).joinedload(RelatedEntity.language),
            selectinload(Entity.related_entities).joinedload(RelatedEntity.account),
        )
    )

    page = await paginate(db, query.where(Entity.id == entity_id))
    page.total = await db.scalar(create_count_query(query))

    return page
chichi13 commented 3 weeks ago

Yes, I was sure it wasn't a bug and that there was a solution when I saw the source code, but I couldn't figure out how to do it.

Thanks for the solution, but I have the impression that it doesn't work. Let me explain, I used this code:

from fastapi_pagination import Page, paginate
from fastapi_pagination.customization import CustomizedPage, UseIncludeTotal
from fastapi_pagination.ext.sqlalchemy import create_count_query

@router.get(
    "/{entity_id}/related_entities",
    status_code=status.HTTP_200_OK,
    response_model=CustomizedPage[Page[EntityWithEvents], UseIncludeTotal(False)],
    description="Get all related entities for a specific entity",
)
async def get_specific_entity_related_entities(
        entity_id: int,
        db: Annotated[AsyncSession, Depends(get_db)],
) -> AbstractPage:
    query = (
        select(Entity)
        .options(
            selectinload(Entity.related_entities).joinedload(RelatedEntity.category),
            selectinload(Entity.related_entities).joinedload(RelatedEntity.type),
            selectinload(Entity.related_entities).joinedload(RelatedEntity.language),
            selectinload(Entity.related_entities).joinedload(RelatedEntity.account),
        )
    )

    page = await paginate(db, query.where(Entity.id == entity_id))
    page.total = await db.scalar(create_count_query(query))
    page.pages = page.total // page.size + 1

    return page

I added page.pages to avoid having a null, everything is now displayed correctly. However, in the main query, it doesn't do any LIMIT / OFFSET and displays all results regardless of my size parameter:

image

If I try to go page 2:

image

uriyyo commented 3 weeks ago

Could you please log what queries sqlachemy is trying to execute? It should make things more clear

chichi13 commented 3 weeks ago

Looking at the queries, I see where the problem is, but I don't know how to solve it. There's also a problem with count(*) after all.

INFO: 2024-06-07 18:00:09,552 - base - _execute_context - SELECT entity.name, entity.is_verified, entity.is_active, entity.user_id, entity.follower_count, entity.id, entity.created_at, entity.updated_at 
FROM entity 
WHERE entity.id = $1::INTEGER 
LIMIT $2::INTEGER OFFSET $3::INTEGER
2024-06-07 18:00:09,553 INFO sqlalchemy.engine.Engine [cached since 3003s ago] (1, 5, 0)
INFO: 2024-06-07 18:00:09,579 - base - _execute_context - SELECT record.entity_id AS record_entity_id, record.title AS record_title, record.description AS record_description, record.start_at AS record_start_at, record.end_at AS record_end_at, record.processed_at AS record_processed_at, record.follower_count AS record_follower_count, record.processed_by_user_id AS record_processed_by_user_id, record.proposed_by_user_id AS record_proposed_by_user_id, record.user_id AS record_user_id, record.account_id AS record_account_id, record.item_id AS record_item_id, record.category_id AS record_category_id, record.language_id AS record_language_id, record.id AS record_id, record.created_at AS record_created_at, record.updated_at AS record_updated_at, account_1.platform AS account_1_platform, account_1.title AS account_1_title, account_1.entity_id AS account_1_entity_id, account_1.id AS account_1_id, account_1.created_at AS account_1_created_at, account_1.updated_at AS account_1_updated_at, item_1.title AS item_1_title, item_1.thumbnail_url AS item_1_thumbnail_url, item_1.id AS item_1_id, item_1.created_at AS item_1_created_at, item_1.updated_at AS item_1_updated_at, category_1.title AS category_1_title, category_1.id AS category_1_id, category_1.created_at AS category_1_created_at, category_1.updated_at AS category_1_updated_at, lang_1.id AS lang_1_id, lang_1.formal_name AS lang_1_formal_name, lang_1.native_name AS lang_1_native_name 
FROM record LEFT OUTER JOIN account AS account_1 ON account_1.id = record.account_id LEFT OUTER JOIN item AS item_1 ON item_1.id = record.item_id LEFT OUTER JOIN category AS category_1 ON category_1.id = record.category_id LEFT OUTER JOIN lang AS lang_1 ON lang_1.id = record.language_id 
WHERE record.entity_id IN ($1::INTEGER)
2024-06-07 18:00:09,580 INFO sqlalchemy.engine.Engine [cached since 3003s ago] (1,)

And the count(*) should normally return 31 (instead of 61 which is the global number of entity) with all the selectinload (keys total):

INFO: 2024-06-07 18:00:09,657 - base - _execute_context - SELECT count(*) AS count_1 
FROM (SELECT entity.title AS title, entity.is_verified AS is_verified, entity.is_active AS is_active, entity.user_id AS user_id, entity.follower_count AS follower_count, entity.id AS id, entity.created_at AS created_at, entity.updated_at AS updated_at 
FROM entity) AS anon_1
chichi13 commented 3 weeks ago

For the count create_count_query has the options(noload("*")) which overwrites all the options I've set. So I've done something like this:

@router.get(
    "/{entity_id}/related_entities",
    status_code=status.HTTP_200_OK,
    response_model=CustomizedPage[Page[EntityWithEvents], UseIncludeTotal(False)],
    description="Get all related entities for a specific entity",
)
async def get_specific_entity_related_entities(
    entity_id: int,
    db: Annotated[AsyncSession, Depends(get_db)],
) -> AbstractPage:
    query = (
        select(Entity)
        .options(
            selectinload(Entity.events).joinedload(RelatedEntity.category),
            selectinload(Entity.events).joinedload(RelatedEntity.type),
            selectinload(Entity.events).joinedload(RelatedEntity.language),
            selectinload(Entity.events).joinedload(RelatedEntity.account),
        )
        .where(Entity.id == entity_id)  # noqa
    )

    page = await paginate(db, query)
    events_count_query = (
        select(func.count(RelatedEntity.id))
        .select_from(Entity)
        .join(Entity.events)
        .where(Entity.id == entity_id)  # noqa
    )

    # Get the total count
    page.total = await db.scalar(events_count_query)
    page.pages = (page.total // page.size) + 1 if page.size else 1

    return page

But I still have the issue with the items without LIMIT / OFFSET

chichi13 commented 3 weeks ago

Seems like the options parameter is the cause of this:

    query = (
        select(Entity)
        .options(
            selectinload(Entity.events).joinedload(RelatedEntity.category),
            selectinload(Entity.events).joinedload(RelatedEntity.type),
            selectinload(Entity.events).joinedload(RelatedEntity.language),
            selectinload(Entity.events).joinedload(RelatedEntity.account),
        )
        .where(Entity.id == entity_id)  # noqa
    )

SELECT * FROM entity WHERE entity_id.id = $1::INTEGER LIMIT $2::INTEGER OFFSET $3::INTEGER --> The main SELECT is done before the others selectinload (that's the objective of selectinload).

    query = (
        select(Entity)
        .options(
            joinedload(Entity.events).selectinload(RelatedEntity.category),
            joinedload(Entity.events).selectinload(RelatedEntity.type),
            joinedload(Entity.events).selectinload(RelatedEntity.language),
            joinedload(Entity.events).selectinload(RelatedEntity.account),
        )
        .where(Entity.id == entity_id)  # noqa
    )

The SQL Query is this one:

Paginated query: SELECT anon_1.title, anon_1.is_verified, anon_1.is_active, anon_1.user_id, anon_1.follower_count, anon_1.id, anon_1.created_at, anon_1.updated_at, related_1.title AS title_1, related_1.description, related_1.start_at, related_1.end_at, related_1.processed_at, related_1.follower_count AS follower_count_1, related_1.processed_by_user_id, related_1.proposed_by_user_id, related_1.user_id AS user_id_1, related_1.entity_id, related_1.account_id, related_1.item_id, related_1.category_id, related_1.language_id, related_1.id AS id_1, related_1.created_at AS created_at_1, related_1.updated_at AS updated_at_1 
FROM (SELECT entity.title AS title, entity.is_verified AS is_verified, entity.is_active AS is_active, entity.user_id AS user_id, entity.follower_count AS follower_count, entity.id AS id, entity.created_at AS created_at, entity.updated_at AS updated_at 
FROM entity 
WHERE entity.id = :id_2
 LIMIT :param_1 OFFSET :param_2) AS anon_1 LEFT OUTER JOIN related AS related_1 ON anon_1.id = related_1.entity_id
2024-06-08 00:17:27,228 INFO sqlalchemy.engine.Engine SELECT count(related.id) AS count_1 
FROM entity JOIN related ON entity.id = related.entity_id 
WHERE entity.id = $1::INTEGER

The LIMIT / OFFSET are before the joins...

The problem is the same if I try this:

    query = (
        select(Entity)
        .options(
            joinedload(Entity.events).joinedload(RelatedEntity.category),
            joinedload(Entity.events).joinedload(RelatedEntity.type),
            joinedload(Entity.events).joinedload(RelatedEntity.language),
            joinedload(Entity.events).joinedload(RelatedEntity.account),
        )
        .where(Entity.id == entity_id)  # noqa
    )

So I think I've pointed out the problem, but I have no idea how to solve it, any idea @uriyyo ?

chichi13 commented 3 weeks ago

I've created the SQL paginated query with the create_paginate_query function from this file: https://github.com/uriyyo/fastapi-pagination/blob/48c085384c3bcdf7468fc58800d849b1eeab4da3/fastapi_pagination/ext/sqlalchemy.py#L92

uriyyo commented 3 weeks ago

@chichi13 Okay, now I guess I understand your problem. So you want to paginate not Entity but RelatedEntity instead? Is it correct?

chichi13 commented 3 weeks ago

@uriyyo yes exactly, but I need some information in the Entity as well, that's why I need to do some joins

uriyyo commented 3 weeks ago

@chichi13 Hmmm, interesting case. I will try to come up with some solution for you, but it's tricky task

chichi13 commented 3 weeks ago

So to be clear:

select(Entity)
.where(Entity.id == entity_id)

In the Entity I need some informations about the Entity

    events_count_query = (
        select(func.count(RelatedEntity.id))
        .select_from(Entity)
        .join(Entity.events)
        .where(Entity.id == entity_id)  # noqa
    )
    query = (
        select(Entity)
        .options(
            selectinload(Entity.events).joinedload(RelatedEntity.category),
            selectinload(Entity.events).joinedload(RelatedEntity.type),
            selectinload(Entity.events).joinedload(RelatedEntity.language),
            selectinload(Entity.events).joinedload(RelatedEntity.account),
        )
        .where(Entity.id == entity_id)  # noqa
    )

So if I want to query page 1, size 5 it will show the Entity information I need and 5 Entity.events

Maybe it's clearer like this?

chichi13 commented 2 weeks ago

Do you have any idea when you'll be able to work on it? Just to know

uriyyo commented 2 weeks ago

I hope I will be able to do it this weekend. Sorry, I had pretty busy week 😢

moumoutte commented 2 weeks ago

My 2cts, If you want to paginate the RelatedEntity, you should query into this model.


paginate(session, select(RelatedEntity).where(RelatedEntity.entity_id==<entity_id>))

However, you can try to use the unique=False argument available into the paginate method.

uriyyo commented 2 weeks ago

@chichi13 I agree with @moumoutte. It's more natural way of how pagination should work.

chichi13 commented 1 week ago

@uriyyo @moumoutte I ended up doing things differently, changing the pydantic scheme to do as you advised. So I paginated RelatedEntity.

uriyyo commented 1 week ago

@chichi13 Is this issue still relevant?

chichi13 commented 1 week ago

I'll close it :) Thanks!