Closed michael-e closed 11 years ago
I just noticed that the total-entries
count is wrong as well if there are "duplicates".
I tracked this down to the fetchByPage
function returning the wrong count for $total_rows
:
$total_rows = Symphony::Database()->fetchCol('total_rows', 'SELECT FOUND_ROWS() AS `total_rows`');
Here we go.
I found two problems, the first one being only a small one.
The preg_replace
does nothing
// Add SQL_CALC_FOUND_ROWS to the first SELECT.
$sql = preg_replace('/^SELECT `e`.id/', 'SELECT SQL_CALC_FOUND_ROWS `e`.id', $sql, 1);
It simply doesn't work because there is whitespace before the SELECT
statement. So it should rather be:
// Add SQL_CALC_FOUND_ROWS to the first SELECT.
$sql = preg_replace('/SELECT `e`.id/', 'SELECT SQL_CALC_FOUND_ROWS `e`.id', $sql, 1);
Fortunately, FOUND_ROWS()
falls back to use the most recent successful SELECT statement (see http://dev.mysql.com/doc/refman/5.6/en/information-functions.html#function_found-rows), so we never noticed this issue. Nevertheless I suggest to fix it.
UNION ALL
?
UNION ALL
does not remove duplicate rows, so the count must be wrong
If I change:
$sql = implode(" UNION ALL ", $this->data['sql']);
to:
$sql = implode(" UNION DISTINCT ", $this->data['sql']);
the count works properly.
It seems to me that the "uniqueness" of entries has been handled by Symphony itself (or maybe by some other extension code). I don't see any side effects of forcing the uniqueness by using UNION DISTINCT
, but I admit that I may not see the big picture.
So can you please check this suggestion and think about it?
Hey @michael-e! Thanks for debugging this.
^
will cause all selects to be counted in the final UNION'd query, which could be undesirable if extensions use sub queries. I think we can trim $sql
or adjust how it it is built to avoid this whitespace so the regular expression will work.UNION DISTINCT
. I would like to check what currently happens now if you union two datasources from the same section that should return the same entry twice. I would want it to only return the entry once, but I have a feeling the same entry would actually appear in the ?debug
twice..Removing the ^ will cause all selects to be counted in the final UNION'd query, which could be undesirable if extensions use sub queries.
Good idea. If that doesn't work, we can still match the leading whitespace in the regex using \s
.
I would like to check what currently happens now if you union two datasources from the same section that should return the same entry twice.
That was my use case, wasn't it? As I said, UNION DISTINCT
actually forces uniqueness, but UNION ALL
does not. So using UNION ALL
you will get double entries, which will then be filtered out by a later process. Using UNION DISTINCT
there are no duplicates (and so the entry list and the count are working properly).
Right, it's the latter process that I'm interested in (curiosity's sake), but sure, UNION DISTINCT
it up baby.
Ah, I see. No, actually in these cases (using UNION ALL
) entries do not appear twice in the output. The duplicates get lost somewhere — which is why the wrong number of entries is returned in the end. I have no idea where the duplicates are eliminated.
Should I send a pull request for the UNION
thingie? It occurs only once, so you might just fix it manually, if you like!
I was just about to log this one as well.
I logged an issue recently (last 6 months) on Symphony to ask for multiple sort by's in the datasource editor, and it was for this exact reason.
We have a project that sorts two datasources on different columns, and then Union's them to get one list of the two sorts, only problem is duplicates are removed. Setting either original datasource to not paginate (returning all entries) and then paginating in the Union DS still returns the incorrect count.
So, UNION DISTINCT
seemed to fix the issue I had enough to close it for the client, but I'm still sure there's something wrong here... It just doesn't feel right, like black magic or something...
Symphony simply doesn't output duplicate entries, that's why a count including duplicates must be wrong. (Pagination alike.) No black magic here. :-)
(I don't see any reason why Symphony should output duplicates, BTW.)
I am using the extension with Symphony 2.2.5, so I checked out the last compatible commit (8c64a0805758baebb8b0c141f775c1cf5e53ee12).
As far as I see, there has been no fix for my problem in the meantime.
I am using two datasources pulling from the same section. If entries occur in both datasources, the output XML will not contain duplicates (which is right, of course), but it will contain the wrong number of entries – the number is reduced by the number of duplicates.
So if I say "25 per page", it may as well be only 17. Funny: If the last entry on page 1 is a "duplicate" (i.e. will be output by both datasources), it will also occur as first entry on page 2.
I hope that this is something very simple to debug...