iftechfoundation / ifdb-suggestion-tracker

Bugs and feature requests for a future IFDB update
10 stars 0 forks source link

Slow query: Search by author does a full scan of `games` table #406

Closed dfabulich closed 6 months ago

dfabulich commented 9 months ago

I'm not sure what we could do about this… Some kinda full-text search index?

# User@Host: ifdb[ifdb] @ localhost [127.0.0.1]
# Thread_id: 6598006  Schema: ifdb  QC_hit: No
# Query_time: 20.375421  Lock_time: 0.000872  Rows_sent: 3  Rows_examined: 227876
# Rows_affected: 0  Bytes_sent: 2325
# Tmp_tables: 5  Tmp_disk_tables: 1  Tmp_table_sizes: 7013020
# Full_scan: Yes  Full_join: No  Tmp_table: Yes  Tmp_table_on_disk: Yes
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 1  Priority_queue: No
#
# explain: id   select_type table   type    possible_keys   key key_len ref rows    r_rows  filtered    r_filtered  Extra
# explain: 1    PRIMARY games   ALL NULL    NULL    NULL    NULL    12809   12809.00    100.00  0.02    Using where; Using temporary; Using filesort
# explain: 1    PRIMARY <derived4>  ref key0    key0    34  ifdb.games.id   10  1.00    100.00  100.00  
# explain: 4    DERIVED <derived5>  ALL NULL    NULL    NULL    NULL    68110   22599.00    100.00  100.00  Using temporary; Using filesort
# explain: 5    DERIVED reviews ALL NULL    NULL    NULL    NULL    68110   68110.00    100.00  93.12   Using where; Using temporary; Using filesort
# explain: 5    DERIVED users   eq_ref  PRIMARY PRIMARY 34  ifdb.reviews.userid 1   1.00    100.00  100.00  Using where
#
SET timestamp=1696915111;
select sql_calc_found_rows
              games.id as id,
                       games.title as title,
                       games.author as author,
                       games.tags as tags,
                       games.moddate as moddate,
                       games.system as devsys,
                       if (time(games.published) = '00:00:00',
                           date_format(games.published, '%Y'),
                           date_format(games.published, '%M %e, %Y'))
                         as pubfmt,
                       if (time(games.published) = '00:00:00',
                           date_format(games.published, '%Y'),
                           date_format(games.published, '%Y-%m-%d'))
                         as published,
                       date_format(games.published, '%Y') as pubyear,
                       (games.coverart is not null) as hasart,
                       avgRating as avgrating,
                       numRatingsInAvg as ratingcnt,
                       stdDevRating as ratingdev,
                       numRatingsTotal,
                       numMemberReviews,
                       games.sort_title as sort_title,
                       games.sort_author as sort_author,
                       ifnull(games.published, '9999-12-31') as sort_pub,
                       games.flags

            from
              games
                      left join gameRatingsSandbox0 on games.id = gameid
            where
              ( author like '%Michael Townsend%'  or author rlike '[[:<:]]M.*[[:space:]]+Townsend[[:>:]]')

            group by games.id

            order by
              starsort desc,
              sort_title
            limit 0, 20;
salty-horse commented 6 months ago

Two options:

  1. Create a new fulltext index just for the author.
  2. Re-use the existing (title, author, desc, tags) fulltext index to search more than we need, but still filter the results for the LIKE to run on less records. (Assuming the execution plan selects the fulltext index.)

Here's the second approach. Explain says it's using the index, but I didn't measure if it's actually faster: https://github.com/iftechfoundation/ifdb/compare/main...salty-horse:ifdb:author_search_optimization?expand=1

dfabulich commented 6 months ago

Even though we have a new fulltext index for the author column, the query still isn't using it, because it still has the RLIKE clause.

select sql_calc_found_rows
    games.id as id,
    games.title as title,
    games.author as author,
    games.tags as tags,
    games.moddate as moddate,
    games.system as devsys,
    if (time(games.published) = '00:00:00',
        date_format(games.published, '%Y'),
        date_format(games.published, '%M %e, %Y'))
        as pubfmt,
    if (time(games.published) = '00:00:00',
        date_format(games.published, '%Y'),
        date_format(games.published, '%Y-%m-%d'))
        as published,
    date_format(games.published, '%Y') as pubyear,
    (games.coverart is not null) as hasart,
    avgRating as avgrating,
    numRatingsInAvg as ratingcnt,
    stdDevRating as ratingdev,
    numRatingsTotal,
    numMemberReviews,
    games.sort_title as sort_title,
    games.sort_author as sort_author,
    ifnull(games.published, '9999-12-31') as sort_pub,
    games.flags

from
    games
                left join gameRatingsSandbox0_mv on games.id = gameid

where
    ( match (author) against ('Gray Manichean')  or author rlike '[[:<:]]G.*[[:space:]]+Manichean[[:>:]]')

order by
    starsort desc

limit 0, 20;

Analyze says it's doing a full table scan:

+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+----------------------------------------------+
| id   | select_type | table                  | type   | possible_keys | key     | key_len | ref           | rows  | r_rows   | filtered | r_filtered | Extra                                        |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+----------------------------------------------+
|    1 | SIMPLE      | games                  | ALL    | NULL          | NULL    | NULL    | NULL          | 12987 | 12987.00 |   100.00 |       0.12 | Using where; Using temporary; Using filesort |
|    1 | SIMPLE      | gameRatingsSandbox0_mv | eq_ref | PRIMARY       | PRIMARY | 34      | ifdb.games.id | 1     | 0.56     |   100.00 |     100.00 |                                              |
+------+-------------+------------------------+--------+---------------+---------+---------+---------------+-------+----------+----------+------------+----------------------------------------------+

But ever since we rolled out our other changes, this query hasn't show up in the slow query log, so it may not be such a big deal any more.

Furthermore, https://github.com/iftechfoundation/ifdb/pull/277 appears to have caused https://github.com/iftechfoundation/ifdb-suggestion-tracker/issues/436 so I'm going to roll it back.