Open tomcardoso opened 3 years ago
That didn't work, so I've tried doing something like this instead:
add_index :cases, "to_tsvector('english', case_number || ' ' || short_title)", using: :gin, name: 'cases_keywords_idx', algorithm: :concurrently
But the search is still very, very slow (taking 10 seconds or longer). I've done some profiling and it's all down to the bottleneck created by the special two-table, four-column query I mentioned in my previous message. Here's an example query that lasts ~10 seconds:
SELECT "cases".* FROM "cases" INNER JOIN "case_events" ON "case_events"."case_id" = "cases"."id" WHERE (to_tsvector('english', "cases"."case_number"::text) @@ websearch_to_tsquery('english', 'police'::text) OR to_tsvector('english', "cases"."short_title"::text) @@ websearch_to_tsquery('english', 'police'::text) OR to_tsvector('english', "case_events"."party_name"::text) @@ websearch_to_tsquery('english', 'police'::text) OR to_tsvector('english', "case_events"."short_title"::text) @@ websearch_to_tsquery('english', 'police'::text)) GROUP BY "cases"."id" ORDER BY max(COALESCE(ts_rank(to_tsvector('english', "cases"."case_number"::text), websearch_to_tsquery('english', 'police'::text)), 0) + COALESCE(ts_rank(to_tsvector('english', "cases"."short_title"::text), websearch_to_tsquery('english', 'police'::text)), 0) + COALESCE(ts_rank(to_tsvector('english', "case_events"."party_name"::text), websearch_to_tsquery('english', 'police'::text)), 0) + COALESCE(ts_rank(to_tsvector('english', "case_events"."short_title"::text), websearch_to_tsquery('english', 'police'::text)), 0)) LIMIT $1 OFFSET $2;
I've tested both with and without that indexing, and it doesn't seem to really make a difference, which makes me wonder whether I need to index things some other way, or if it's not the ts_rank
operations that are really slowing me down. Some testing suggested the search is fairly snappy until I get to .order(Arel.sql("max(#{order.join(' + ')})"))
, at which point the query grinds to a halt, so that must be an expensive operation. Any suggestions here would be much appreciated.
@tomcardoso for some reason if you join and lookup into two tables GIN indexes are not used. I have managed it is possible to check for matching JOINed table in CTE and just pass ids to final query. You need to use ANY(ARRAY(...
method to keep indexes being used. Here is example fast query.
WITH related_labels AS (
SELECT id FROM labels WHERE account_id = 80 AND to_tsvector('english_nostop', "labels"."name"::TEXT) @@ websearch_to_tsquery('english_nostop', 'Hello'::TEXT)
)
SELECT
"products" .*,
COALESCE(ts_rank(to_tsvector('english_nostop', "products"."title"::TEXT), websearch_to_tsquery('english_nostop', 'Hello'::TEXT)), 0) + COALESCE(ts_rank(to_tsvector('english_nostop', "products"."catalog_number"::TEXT), websearch_to_tsquery('english_nostop', 'Hello'::TEXT)), 0) + COALESCE(ts_rank(to_tsvector('english_nostop', "products"."version"::TEXT), websearch_to_tsquery('english_nostop', 'Hello'::TEXT)), 0) + COALESCE(ts_rank(to_tsvector('english_nostop', "products"."artist"::TEXT), websearch_to_tsquery('english_nostop', 'Hello'::TEXT)), 0) + COALESCE(ts_rank(to_tsvector('english_nostop', "products"."barcode"::TEXT), websearch_to_tsquery('english_nostop', 'Hello'::TEXT)), 0) + COALESCE(ts_rank(to_tsvector('english_nostop', "products"."company_id"::TEXT), websearch_to_tsquery('english_nostop', 'Hello'::TEXT)), 0) + COALESCE(ts_rank(to_tsvector('english_nostop', "labels"."name"::TEXT), websearch_to_tsquery('english_nostop', 'Hello'::TEXT)), 0) AS "rank20534463617833501"
FROM
"products"
LEFT OUTER JOIN "labels" ON
"labels"."id" = "products"."label_id"
WHERE
"products"."account_id" = 80
AND (to_tsvector('english_nostop', "products"."title"::TEXT) @@ websearch_to_tsquery('english_nostop', 'Hello'::TEXT)
OR to_tsvector('english_nostop', "products"."version"::TEXT) @@ websearch_to_tsquery('english_nostop', 'Hello'::TEXT)
OR to_tsvector('english_nostop', "products"."artist"::TEXT) @@ websearch_to_tsquery('english_nostop', 'Hello'::TEXT)
OR to_tsvector('english_nostop', "products"."barcode"::TEXT) @@ websearch_to_tsquery('english_nostop', 'Hello'::TEXT)
OR to_tsvector('english_nostop', "products"."company_id"::TEXT) @@ websearch_to_tsquery('english_nostop', 'Hello'::TEXT)
OR label_id = ANY(ARRAY(SELECT * FROM related_labels)))
ORDER BY
"rank20534463617833501" DESC
I'm a bit confused as to how I can set up indexing on web search, especially if I want to use the approach we talked about in https://github.com/textacular/textacular/issues/137. To recap, what I'm doing is:
Which searches a number of fields for a certain keyword and returns all results that match in any of those fields.
But I'm finding the search to be very slow for my database (~1.8m case event entries). I went in and added simple keyword indexes of this sort:
But I have a feeling I may need to do something more. Do you have any advice on what I should do? I know Textacular recommends using the
:sql
schema file format, but I'd prefer to keep it as aschema.rb
for compatibility reasons with my production database.