symphonists / uniondatasource

A union datasources allows you to combine multiple datasources to output a single datasource for the primary purpose of a unified pagination.
13 stars 11 forks source link

Pagination returns wrong number of entries if duplicates exist #23

Closed michael-e closed 11 years ago

michael-e commented 12 years ago

I am using the extension with Symphony 2.2.5, so I checked out the last compatible commit (8c64a0805758baebb8b0c141f775c1cf5e53ee12).

As far as I see, there has been no fix for my problem in the meantime.

I am using two datasources pulling from the same section. If entries occur in both datasources, the output XML will not contain duplicates (which is right, of course), but it will contain the wrong number of entries – the number is reduced by the number of duplicates.

So if I say "25 per page", it may as well be only 17. Funny: If the last entry on page 1 is a "duplicate" (i.e. will be output by both datasources), it will also occur as first entry on page 2.

I hope that this is something very simple to debug...

michael-e commented 12 years ago

I just noticed that the total-entries count is wrong as well if there are "duplicates".

michael-e commented 12 years ago

I tracked this down to the fetchByPage function returning the wrong count for $total_rows:

$total_rows = Symphony::Database()->fetchCol('total_rows', 'SELECT FOUND_ROWS() AS `total_rows`');
michael-e commented 12 years ago

Here we go.

I found two problems, the first one being only a small one.

The preg_replace does nothing

// Add SQL_CALC_FOUND_ROWS to the first SELECT.
$sql = preg_replace('/^SELECT `e`.id/', 'SELECT SQL_CALC_FOUND_ROWS `e`.id', $sql, 1);

It simply doesn't work because there is whitespace before the SELECT statement. So it should rather be:

// Add SQL_CALC_FOUND_ROWS to the first SELECT.
$sql = preg_replace('/SELECT `e`.id/', 'SELECT SQL_CALC_FOUND_ROWS `e`.id', $sql, 1);

Fortunately, FOUND_ROWS() falls back to use the most recent successful SELECT statement (see http://dev.mysql.com/doc/refman/5.6/en/information-functions.html#function_found-rows), so we never noticed this issue. Nevertheless I suggest to fix it.

UNION ALL?

UNION ALL does not remove duplicate rows, so the count must be wrong

If I change:

$sql = implode(" UNION ALL ", $this->data['sql']);

to:

$sql = implode(" UNION DISTINCT ", $this->data['sql']);

the count works properly.

It seems to me that the "uniqueness" of entries has been handled by Symphony itself (or maybe by some other extension code). I don't see any side effects of forcing the uniqueness by using UNION DISTINCT, but I admit that I may not see the big picture.

So can you please check this suggestion and think about it?

brendo commented 12 years ago

Hey @michael-e! Thanks for debugging this.

michael-e commented 12 years ago

Removing the ^ will cause all selects to be counted in the final UNION'd query, which could be undesirable if extensions use sub queries.

Good idea. If that doesn't work, we can still match the leading whitespace in the regex using \s.

I would like to check what currently happens now if you union two datasources from the same section that should return the same entry twice.

That was my use case, wasn't it? As I said, UNION DISTINCT actually forces uniqueness, but UNION ALL does not. So using UNION ALLyou will get double entries, which will then be filtered out by a later process. Using UNION DISTINCT there are no duplicates (and so the entry list and the count are working properly).

brendo commented 12 years ago

Right, it's the latter process that I'm interested in (curiosity's sake), but sure, UNION DISTINCT it up baby.

michael-e commented 12 years ago

Ah, I see. No, actually in these cases (using UNION ALL) entries do not appear twice in the output. The duplicates get lost somewhere — which is why the wrong number of entries is returned in the end. I have no idea where the duplicates are eliminated.

Should I send a pull request for the UNION thingie? It occurs only once, so you might just fix it manually, if you like!

designermonkey commented 12 years ago

I was just about to log this one as well.

I logged an issue recently (last 6 months) on Symphony to ask for multiple sort by's in the datasource editor, and it was for this exact reason.

We have a project that sorts two datasources on different columns, and then Union's them to get one list of the two sorts, only problem is duplicates are removed. Setting either original datasource to not paginate (returning all entries) and then paginating in the Union DS still returns the incorrect count.

designermonkey commented 12 years ago

So, UNION DISTINCT seemed to fix the issue I had enough to close it for the client, but I'm still sure there's something wrong here... It just doesn't feel right, like black magic or something...

michael-e commented 12 years ago

Symphony simply doesn't output duplicate entries, that's why a count including duplicates must be wrong. (Pagination alike.) No black magic here. :-)

(I don't see any reason why Symphony should output duplicates, BTW.)