Closed dfabulich closed 6 months ago
This might be a regression from Search by TUID #254 (commit ebd1eecf95a86ed95cb92c6e5f6bea0fa68634f8), which added games.id
to the list of columns to match.
The columns must be identical to the list in the fulltext index, or it won't get used. See Full-Text Restrictions:
The MATCH() column list must match exactly the column list in some FULLTEXT index definition for the table, unless this MATCH() is IN BOOLEAN MODE on a MyISAM table. For MyISAM tables, boolean-mode searches can be done on nonindexed columns, although they are likely to be slow.
~~Does anyone actually use the "search by TUID" functionality, or was it just added to fix the popup for searching games? The popup's query can be changed to search for "X tuid:X" the only search term X is a 16-char alphanumeric string.~~
EDIT: The suggestion won't work, since it AND
s the tuid search with the X.
Also, as mentioned in https://github.com/iftechfoundation/ifdb-suggestion-tracker/issues/300#issuecomment-1353480657, the instruction above the pop-up is wrong, since you can't search by IFID without using an ifid
prefix. Whatever the solution is, it would need to be edited.
Several options:
You're right, but that's not the entire problem. Starting with the game-ratings-materialized-view
branch, I reverted https://github.com/iftechfoundation/ifdb/commit/ebd1eecf95a86ed95cb92c6e5f6bea0fa68634f8 and tried a simple search for Zork http://localhost:8080/search?searchbar=Zork
.
It ran this query:
select
games.id as id,
games.title as title,
games.author as author,
games.tags as tags,
games.moddate as moddate,
games.system as devsys,
if (time(games.published) = '00:00:00',
date_format(games.published, '%Y'),
date_format(games.published, '%M %e, %Y'))
as pubfmt,
if (time(games.published) = '00:00:00',
date_format(games.published, '%Y'),
date_format(games.published, '%Y-%m-%d'))
as published,
date_format(games.published, '%Y') as pubyear,
(games.coverart is not null) as hasart,
avgRating as avgrating,
numRatingsInAvg as ratingcnt,
stdDevRating as ratingdev,
numRatingsTotal,
numMemberReviews,
games.sort_title as sort_title,
games.sort_author as sort_author,
ifnull(games.published, '9999-12-31') as sort_pub,
games.flags
, match (title, author, `desc`, tags) against ('Zork' in boolean mode) as relevance
from
games
left join gameRatingsSandbox0_mv on games.id = gameid
where
( (match (title, author, `desc`, tags) against ('Zork' in boolean mode) or (title like 'Zork' or ( title like '%Zork%'))))
order by
if(title = 'Zork',0,1),if(title like 'Zork%',0,1),if((title like 'Zork' or ( title like '%Zork%')),0,1),relevance desc
limit 0, 20;
When I analyze
that query, it says that it considered using the fulltext indexes, but decided against it.
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+----------------------------------------------+
| 1 | SIMPLE | games | ALL | title_2,title | NULL | NULL | NULL | 13007 | 13007.00 | 100.00 | 0.52 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | gameRatingsSandbox0_mv | eq_ref | PRIMARY | PRIMARY | 34 | ifdb.games.id | 1 | 0.91 | 100.00 | 100.00 | |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+----------------------------------------------+
I believe this is happening because of the LIKE
parts of the WHERE
clause. When I go to searchutil.php
and look at line 446:
$expr = "($matchMode ($matchExpr or $likeExpr))";
… and replace that line with:
$expr = "($matchMode ($matchExpr))";
It then generates a query without the LIKE
clauses:
select
games.id as id,
games.title as title,
games.author as author,
games.tags as tags,
games.moddate as moddate,
games.system as devsys,
if (time(games.published) = '00:00:00',
date_format(games.published, '%Y'),
date_format(games.published, '%M %e, %Y'))
as pubfmt,
if (time(games.published) = '00:00:00',
date_format(games.published, '%Y'),
date_format(games.published, '%Y-%m-%d'))
as published,
date_format(games.published, '%Y') as pubyear,
(games.coverart is not null) as hasart,
avgRating as avgrating,
numRatingsInAvg as ratingcnt,
stdDevRating as ratingdev,
numRatingsTotal,
numMemberReviews,
games.sort_title as sort_title,
games.sort_author as sort_author,
ifnull(games.published, '9999-12-31') as sort_pub,
games.flags
, match (title, author, `desc`, tags) against ('Zork' in boolean mode) as relevance
from
games
left join gameRatingsSandbox0_mv on games.id = gameid
where
( (match (title, author, `desc`, tags) against ('Zork' in boolean mode)))
order by
if(title = 'Zork',0,1),if(title like 'Zork%',0,1),if((title like 'Zork' or ( title like '%Zork%')),0,1),relevance desc
limit 0, 20;
… and that query is using the fulltext
index when I ANALYZE
it.
+------+-------------+------------------------+----------+---------------+---------+---------+---------------+------+--------+----------+------------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
+------+-------------+------------------------+----------+---------------+---------+---------+---------------+------+--------+----------+------------+----------------------------------------------+
| 1 | SIMPLE | games | fulltext | title | title | 0 | | 1 | 68.00 | 100.00 | 100.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | gameRatingsSandbox0_mv | eq_ref | PRIMARY | PRIMARY | 34 | ifdb.games.id | 1 | 0.91 | 100.00 | 100.00 | |
+------+-------------+------------------------+----------+---------------+---------+---------+---------------+------+--------+----------+------------+----------------------------------------------+
Now, what's really confusing about that is that I would have therefore expected different results when removing the LIKE
clause, but, to my surprise, it didn't change anything in the results.
I thought that maybe a search for ounterfeit Monkey
without the C would trigger different results, but, no, Counterfeit Monkey returned right at the top; the results seem identical to the results we'd get from including the LIKE
clause.
I… don't really understand this yet, but I'll put together a PR and try to investigate it later.
I'm not sure what we could do about this… Some kinda full-text search index?