middlebury / coursecatalog

This Course-Catalog is a project to develop a web front-end for searching and browsing course information stored in Banner.
https://github.com/middlebury/coursecatalog/wiki
GNU General Public License v3.0
9 stars 2 forks source link

FULLTEXT searches cause excessively slow queries #27

Closed adamfranco closed 5 years ago

adamfranco commented 5 years ago

When using the Keyword field on the search form a query is built that includes a MATCH (SSBSECT_fulltext) AGAINST ('<search keyword>' IN BOOLEAN MODE) AS relevence,. While this is valid syntax, these queries are taking up to 15-30 seconds in FULLTEXT initialization state and heavy use can subsequently impact database load and overall system performance.

Example query searching for biology: https://catalog.middlebury.edu/offerings/search/catalog/catalog%2FMCUG?search=Search&term=term%2F201920&department=&keywords=biology&time_start=0&time_end=86400&type%5B%5D=genera%3Aoffering%2FLCT&type%5B%5D=genera%3Aoffering%2FSEM

SELECT
SSBSECT_TERM_CODE,
SSBSECT_CRN,
SSBSECT_SUBJ_CODE,
SSBSECT_CRSE_NUMB,
SSBSECT_SEQ_NUMB,
SSBSECT_PTRM_CODE,
SSBSECT_CRSE_TITLE,
SSBSECT_MAX_ENRL,
SSBSECT_ENRL,
SSBSECT_SEATS_AVAIL,
SSBSECT_LINK_IDENT,
SSBDESC_TEXT_NARRATIVE,
SCBCRSE_TITLE,
SCBDESC_TEXT_NARRATIVE,
SSBSECT_CAMP_CODE,
term_display_label,
STVTERM_START_DATE,
STVSCHD_CODE,
STVSCHD_DESC,
SSRMEET_BLDG_CODE,
SSRMEET_ROOM_CODE,
SSRMEET_BEGIN_TIME,
SSRMEET_END_TIME,
SSRMEET_SUN_DAY,
SSRMEET_MON_DAY,
SSRMEET_TUE_DAY,
SSRMEET_WED_DAY,
SSRMEET_THU_DAY,
SSRMEET_FRI_DAY,
SSRMEET_SAT_DAY,
SSRMEET_START_DATE,
SSRMEET_END_DATE,
COUNT(SSRMEET_TERM_CODE) as num_meet,
STVBLDG_DESC,
STVCAMP_DESC,
SCBCRSE_EFF_TERM ,
SCBCRSE_DEPT_CODE,
SCBCRSE_DIVS_CODE,
MATCH (SSBSECT_fulltext) AGAINST ('biology' IN BOOLEAN MODE) AS relevence,
SSRXLST_XLST_GROUP
FROM
ssbsect_scbcrse_scbdesc
INNER JOIN catalog_term ON SSBSECT_TERM_CODE = catalog_term.term_code
INNER JOIN course_catalog_college ON course_catalog_college.coll_code = SCBCRSE_COLL_CODE
INNER JOIN course_catalog ON course_catalog_college.catalog_id = course_catalog.catalog_id
LEFT JOIN STVTERM ON SSBSECT_TERM_CODE = STVTERM_CODE
LEFT JOIN SSBDESC ON (SSBSECT_TERM_CODE = SSBDESC_TERM_CODE AND SSBSECT_CRN = SSBDESC_CRN)
LEFT JOIN SSRMEET ON (SSBSECT_TERM_CODE = SSRMEET_TERM_CODE AND SSBSECT_CRN = SSRMEET_CRN)
LEFT JOIN STVBLDG ON SSRMEET_BLDG_CODE = STVBLDG_CODE
LEFT JOIN STVSCHD ON SSBSECT_SCHD_CODE = STVSCHD_CODE
LEFT JOIN STVCAMP ON SSBSECT_CAMP_CODE = STVCAMP_CODE
LEFT JOIN SSRXLST ON (SSBSECT_TERM_CODE = SSRXLST_TERM_CODE AND SSBSECT_CRN = SSRXLST_CRN)

WHERE
(SSBSECT_TERM_CODE = '201920')
AND (SSBSECT_SCHD_CODE = 'LCT'
OR SSBSECT_SCHD_CODE = 'SEM')
AND (MATCH (SSBSECT_fulltext) AGAINST ('biology' IN BOOLEAN MODE))
AND SSBSECT_TERM_CODE IN (
SELECT
term_code
FROM
catalog_term
WHERE
catalog_id = 'MCUG'
)
AND SCBCRSE_COLL_CODE IN (
SELECT
coll_code
FROM
course_catalog_college
WHERE
catalog_id = 'MCUG'
)
AND SSBSECT_SSTS_CODE = 'A'
AND (course_catalog.prnt_ind_to_exclude IS NULL OR SSBSECT_PRNT_IND != course_catalog.prnt_ind_to_exclude)

GROUP BY SSBSECT_TERM_CODE, SSBSECT_CRN
ORDER BY relevence DESC

Hopefully queries relying on keyword searching can be reworked to perform better. A few ideas:

adamfranco commented 5 years ago

I'm unable to replicate this issue in development -- the same query on a fresh dump/import of the database on saw executes in 0.7 seconds instead of 30 seconds. Will need to follow up after the holiday to figure out if this is a database configuration issue.

adamfranco commented 5 years ago

At Mark's suggestion I checked the EXPLAIN plan for both the database on saw (MariaDB 10.1.37) and songbird (MariaDB 10.2.17) and they are the same, so that shouldn't be an indication of the performance differance: screen shot 2019-01-02 at 4 33 36 pm

This query completes in 3.8 seconds on saw versus 22.6 seconds on songbird.