Closed michaelrog closed 6 years ago
PRs welcome!
@michaelrog do you have a multisite setup by chance or do you use multiple different fields to relate your elements instead of just a single one?
Results are consistently incorrect regardless of single/multiple sites or single/multiple fields. (I tested variants of all the above.)
I even reproduced the bug using a DB dump of nystudio107.com from @khalwat. 😕
Where's the PR @michaelrog ? 🥇
@michaelrog would you mind sending me this dump? Because I was not able to reproduce it. I created 100 entries and 8 tags and linked those randomly but every test was successful
It would be much easier to have a good test case
@Anubarak — It's @khalwat's site, so I'll defer to him to share the db dump... But the results when I ran my test template with my own site were the same, and I suspect it would be likewise with your site: Notably, for each result, the number of purported "tags in common" is greater than the total number of tags assigned to either the original source element or the related one. (Actually comparing the IDs between the source element and each related element reveals the true count of shared tags.)
@khalwat — when I get back from vacation, and finish porting my Craft 3 plugins, I'll try to rewrite your SQL. For now I'll stick to just complaining about it. <3
I think he's right @Anubarak -- for instance, check out this article here:
https://nystudio107.com/blog/using-vuejs-2-0-with-craft-cms
The tags are:
#craftcms #vuejs #frontend
...yet the most "similar" articles it finds are:
...which really aren't similar at all. They are just the 3 most recent articles. It's using this code:
{% set limitCriteria = craft.entries.limit(3) %}
{% set similarEntriesByTags = craft.similar.find({ element: entry, context: entry.blogTags, criteria: limitCriteria }) %}
Basically, it's asking for the 3 most recent entries that are related to this blog's tags. The resulting SQL is:
'SELECT `elements`.`id`, `elements_sites`.`siteId`, COUNT(*) as count
FROM (SELECT `elements`.`id` AS `elementsId`, `elements_sites`.`id` AS `elementsSitesId`, `content`.`id` AS `contentId`
FROM `craft_elements` `elements`
INNER JOIN `craft_entries` `entries` ON `entries`.`id` = `elements`.`id`
INNER JOIN `craft_elements_sites` `elements_sites` ON `elements_sites`.`elementId` = `elements`.`id`
INNER JOIN `craft_content` `content` ON `content`.`elementId` = `elements`.`id`
LEFT JOIN `craft_structureelements` `structureelements` ON `structureelements`.`elementId` = `elements`.`id`
LEFT JOIN `craft_relations` ON elements.id=`craft_relations`.sourceId
WHERE (`elements_sites`.`siteId`=\'1\') AND ((elements.id != \'315\') AND (`craft_relations`.`targetId` IN (\'79\', \'313\', \'95\'))) AND (`content`.`siteId`=\'1\') AND (`elements`.`archived`=FALSE) AND (((`elements`.`enabled`=TRUE) AND (`elements_sites`.`enabled`=TRUE)) AND (`entries`.`postDate` <= \'2018-08-04 13:22:03\') AND ((`entries`.`expiryDate` IS NULL) OR (`entries`.`expiryDate` > \'2018-08-04 13:22:03\'))) AND (`elements_sites`.`enabled`=TRUE)
GROUP BY `elements`.`id`
ORDER BY `structureelements`.`lft`, `entries`.`postDate` DESC
LIMIT 3) `subquery`
INNER JOIN `craft_entries` `entries` ON `entries`.`id` = `subquery`.`elementsId`
INNER JOIN `craft_elements` `elements` ON `elements`.`id` = `subquery`.`elementsId`
INNER JOIN `craft_elements_sites` `elements_sites` ON `elements_sites`.`id` = `subquery`.`elementsSitesId`
INNER JOIN `craft_content` `content` ON `content`.`id` = `subquery`.`contentId`
LEFT JOIN `craft_structureelements` `structureelements` ON `structureelements`.`elementId` = `subquery`.`elementsId`
LEFT JOIN `craft_relations` ON elements.id=`craft_relations`.sourceId
GROUP BY `craft_relations`.`sourceId`
ORDER BY `count` DESC'
So interestingly, the count
that is returned is what I think is wrong here. If I change the code to:
{% set limitCriteria = craft.entries.limit(100) %}
{% set similarEntriesByTags = craft.similar.find({ element: entry, context: entry.blogTags, criteria: limitCriteria }) %}
I end up with an array of entries, all of which have a count
of 5 (??), except for the last item in the array, which has a count
of 4.
The part I don't understand is why we're ending up with all of the entries in the found set to begin with. It's clearly doing:
WHERE (`elements_sites`.`siteId`=\'1\') AND ((elements.id != \'315\') AND (`craft_relations`.`targetId` IN (\'79\', \'313\', \'95\'))) AND (`content`.`siteId`=\'1\') AND (`elements`.`archived`=FALSE) AND (((`elements`.`enabled`=TRUE) AND (`elements_sites`.`enabled`=TRUE)) AND (`entries`.`postDate` <= \'2018-08-04 13:22:03\') AND ((`entries`.`expiryDate` IS NULL) OR (`entries`.`expiryDate` > \'2018-08-04 13:22:03\'))) AND (`elements_sites`.`enabled`=TRUE)
...to limit what it finds to elements that have relations to just:
(
craft_relations.
targetIdIN (\'79\', \'313\', \'95\')))
Which are:
79
- craft\elements\Tag
- craftcms
313
- craft\elements\Tag
- vuejs
95
- craft\elements\Tag
- frontend
...yet it's returning elements that do not have craft_relations
to these targetId
s at all....
So here's a thought... @Anubarak added in:
$query->subQuery->groupBy('elements.id');
To group all of the results that have the same id
together, but it almost feels like the COUNT
is then becoming cumulative
It's unclear to me why it'd be capped at 5
but I think this might be what is going on...
So I think it's actually the opposite that is happening. What seems to be going on is that for whatever reason, elements that have 1 thing in common end up with a count of 5
. Elements that have 2 things in common end up with a count of 10
, elements with 3 things in common end up with a count of 15
, and so on.
The subquery:
$query->subQuery->groupBy('elements.id');
...causes them to be grouped such that the count
is always the lowest (first?), so they are all the same in terms of ranking.
So it looks to me like the WHERE
part of the query is actually correct, and it is returning only elements that have the appropriate relations.
The incorrect part is currently how we're grouping the results together, which is not using the correct (highest) count.
This is related to: https://github.com/nystudio107/craft-similar/issues/1
...in that if we could move the limit criteria to the end of the query, this would solve the issue.
Fixed in https://github.com/nystudio107/craft-similar/commit/066948c78090a49280f3b658c538fa1716f20bc1 thanks to @Anubarak
The
find
method does not product correct results.Here's my test setup:
The results of this test, in my case, are something like:
The
count
values are not correct.(In addition to being incorrect in their own faces, they also produce an incorrect order of the results, which is especially problematic if a
limit
is applied in the criteria.)