shish / shimmie2

An easy-to-install community image gallery (aka booru)
http://code.shishnet.org/shimmie2/
GNU General Public License v2.0
395 stars 110 forks source link

Autocomplete with categories #672

Open shish opened 5 years ago

shish commented 5 years ago

Auto-complete with categories would be nice, but I had to disable it in 2935db9d because it was noticeably slow (overall p95 went from 80ms to 150ms) even though I don't have the Categories extension enabled.

Could we only do this extra search if Categories is enabled?

Could we add some kind of index that makes "%:{term}%" efficient?

sanmadjack commented 5 years ago

The performance difference is much less drastic on my system, only about a 10ms difference. Do you have the tags_lower_tag_idx index that was committed awhile back?

sanmadjack commented 5 years ago

I tried a couple of methods, like

   SELECT tag, count
    FROM tags
    WHERE lower(tag) ~ '^(.+:)?da.*'
      AND count > 0
    ORDER BY count DESC

and

SELECT tag, count
FROM tags
WHERE lower(tag) LIKE lower('da%')
   OR substring(lower(tag),position(':' IN tag)+1) LIKE lower('da%')
    AND count > 0
ORDER BY count DESC

with some indexes but wasn't able to improve on the performance of:

SELECT tag, count
FROM tags
WHERE lower(tag) LIKE lower('da%')
   OR lower(tag) LIKE lower('%:da%')
  AND count > 0
ORDER BY count DESC

The first two consistently added 20+ms vs the third query in my data.

Frankly, doing two LIKES is always going to be costlier than one, especially if one uses a wildcard at the beginning. An option within the category extension to turn it on and off would probably be ideal.

But this is if we have to stick to this "starts with" search pattern. This query finished in less than 5ms on my data:

SELECT tag, count
FROM tags
WHERE to_tsvector('english', tag)  @@ to_tsquery('english', 'da:*')
  AND count > 0
ORDER BY count DESC

It's using postgres full-text search, with this index:

create index tags_test ON tags using gin(to_tsvector('english', tag))

This returns these tags:

It's not strictly "starts with" like current behavior, but it does limit it to just words that start with the search pattern.

sanmadjack commented 5 years ago

Tried this:

SELECT tag, count
FROM tags
WHERE to_tsvector('english', tag)  @@ to_tsquery('english', 'da:*')
  AND (lower(tag) LIKE lower('da%')
    OR lower(tag) LIKE lower('%:da%'))
  AND count > 0
ORDER BY count DESC

Produces the starts-with results that is current behavior, but still finishes in less than 5 ms.

I think that it's likely this will be data-dependant, but still. That's fast.

sanmadjack commented 5 years ago

Found a downside to the full text search, it doesn't return results for certain inputs, presumably because the search term isn't specific enough for some internal criteria it has. For instance, I can't search for s: or a:, but d:* works fine. Another letter seems to work fine. Might be a way around somehow, but it might also be a good idea to let it limit like that.

I did some tests and it does marginally improve post searching with wildcard tags, but by less than 100 ms. Not much in the grand scheme of a wildcard query. Let me know what you think about this.