phalcon / cphalcon

High performance, full-stack PHP framework delivered as a C extension.
https://phalcon.io
BSD 3-Clause "New" or "Revised" License
10.79k stars 1.97k forks source link

[BUG]: Discrepancy Between Total Items in Paginator and Direct Query Using Phalcon's QueryBuilder #16581

Open indigo7333 opened 5 months ago

indigo7333 commented 5 months ago

When using Phalcon's modelsManager to construct complex queries involving multiple table joins and conditional filters, there appears to be a mismatch between the total number of items reported by directly querying the database versus using Phalcon's paginator. This discrepancy could potentially affect data presentation and user navigation in applications relying on accurate pagination.

use Phalcon\Mvc\Model\Query\Builder as QueryBuilder;

// Setup the Query Builder $builder = $this->modelsManager->createBuilder() ->columns('DISTINCT MainTable.id') ->from(['MainTable' => 'MainEntity']) ->where("MainTable.owner_id = :owner_id: AND is_active = 1", ['owner_id' => $owner_id]) ->innerJoin('RelatedEntity1', 'MainTable.owner_id = RelatedEntity1.id') ->innerJoin('RelatedEntity2', 'RelatedEntity2.id = MainTable.related_id') ->leftJoin('OptionalEntity1', 'OptionalEntity1.related_id = RelatedEntity2.id') ->leftJoin('OptionalEntity2', 'OptionalEntity2.related_id = RelatedEntity2.id') ->leftJoin('OptionalEntity3', 'OptionalEntity3.main_id = MainTable.id') ->orderBy('MainTable.id DESC');

// Apply filter if present $filter = $this->request->get('search', 'string'); if (!empty($filter)) { $conditions = "MainTable.id = :filterId: OR OptionalEntity3.custom_field LIKE :filterDomain: OR OptionalEntity1.name LIKE :filterName:"; $builder->andWhere($conditions, [ 'filterId' => $filter, 'filterDomain' => '%' . $filter . '%', 'filterName' => '%' . $filter . '%', ]); }

// Initialize Paginator $paginator = new QueryBuilder([ 'builder' => $builder, 'limit' => 13, 'page' => $currentPage ]);

// Direct total count $totalItems = $builder->getQuery()->execute()->count();

// Paginator total count $page = $paginator->paginate(); $paginatorTotalItems = $page->getTotalItems();

// Output for debugging echo "Direct count: $totalItems
"; echo "Paginator count: $paginatorTotalItems
";

Details Version: 5.6.1-2+0~20240213.7+debian10~1.gbp9d5d41 php 8.1

indigo7333 commented 5 months ago

The problem appears to be with the distinct(true) . with distinct(false) the are no issues with pagination numbers.

noone-silent commented 5 months ago

Could you post both SQL Statements here? The one from the builder and the one you wrote?

s-ohnishi commented 5 months ago

@indigo7333 The reason you're not getting the results you expect is probably because you're using DISTINCT. Looking at the source (QueryBuilder.zep), it seems that the difference in judgment is whether you have GROUP BY and HAVING, and if you do not have either, it does not matter whether the query contains "DISTINCT" Instead, "COUNT(*)" will be applied.

I don't really understand whether this judgment is appropriate. I don't understand the logic exactly, but it may be possible to avoid this by using GROUP BY which generally has the same effect as DISTINCT. __SELECT DISTINCT MainTable.id ~~~ __SELECT MainTable.id ~~~ GROUP BY MainTable.id

indigo7333 commented 5 months ago

@s-ohnishi @noone-silent yes, the problem appears when using DISINCT, but that is not correct. It creates a lot of confusion if the pagination doesn't work in that case. I had to come up with a workaround. DISTINCT must be used and not COUNT(*) if distinct parameter is provided.

noone-silent commented 5 months ago

@s-ohnishi DISTINCT and GROUP BY are not the same. DISTINCT just removes duplicates, while GROUP BY allows you to use aggregate functions like SUM, that have a completely different outcome.

I agree with @indigo7333 that the query builder should not just blindly replace all fields with COUNT(*). For example, there are a lot of different possibilities on a SELECT query: https://dev.mysql.com/doc/refman/8.0/en/select.html

Otherwise it should clearly state this in the documentation OR even better, throw an exception if one of this fields is used and the Builder does not support it.

SELECT
    [ALL | DISTINCT | DISTINCTROW ]
    [HIGH_PRIORITY]
    [STRAIGHT_JOIN]
    [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
    [SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
    select_expr [, select_expr] ...

And this is MySQL only.

s-ohnishi commented 5 months ago

@noone-silent I am aware that DISTINCT and GROUP BY are different things from the original purpose. However, in this case, we want to avoid duplicate rows by using JOIN, so we have shown a workaround for that. This is not a suggestion for "general" use (though I have not tested it to confirm).

If you think about it in "general" terms, I have been taught that instead of using DISTINCT to suppress the duplicate rows that have grown up with JOIN, it is better to use EXISTS to extract only the necessary rows. (I have not tried whether subqueries can be used with Phalcon's Builder)

I think it would be desirable to be able to support all grammars, but since the expressions differ depending on the database, I think it's unreasonable to hope for perfection. I think it is acceptable to be limited to some extent, especially when using Builder.

indigo7333 commented 5 months ago

@s-ohnishi I agree that EXISTS is faster, but I was perfectly fine with DISTINCT in my case, and it's not fun getting inconsistent results in PhalconPHP, and many developers might have similar issue.

s-ohnishi commented 5 months ago

@indigo7333 It's not about which is faster, I just thought it might be an alternative if you're having trouble "right now". (Modifications around the database are slow and updates are not very frequent) If the example SQL is just for explanation, it probably won't be easily replaced. If it's a production SQL, I don't understand why it "must" be DISTINCT, so feel free to do whatever you like.