postgrespro / rum

RUM access method - inverted index with additional information in posting lists
Other
715 stars 52 forks source link

Support indexing tsquery phrases #43

Open felixbuenemann opened 6 years ago

felixbuenemann commented 6 years ago

Currently indexing with rum fails, if the tsquery column or expression contains phrases:

ERROR: Indexing of phrase tsqueries isn't supported yet

It would be very useful if this feature was supported, since it would allow to quickly check if a phrase is contained in some text (for example to filter text containing blacklisted phrases).

obartunov commented 6 years ago

On Thu, Jul 19, 2018 at 5:43 PM, Felix Bünemann notifications@github.com wrote:

Currently indexing with rum fails, if the tsquery column or expression contains phrases:

ERROR: Indexing of phrase tsqueries isn't supported yet

It would be very useful if this feature was supported, since it would allow to quickly check if a phrase is contained in some text (for example to filter text containing blacklisted phrases).

please, provide us more information. Exact create index would be enough

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/postgrespro/rum/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AGFI4tIJU_saAx5Mte_mNE6sSphcryxAks5uIJstgaJpZM4VWiGI .

-- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company

felixbuenemann commented 6 years ago
CREATE TABLE phrases (phrase tsquery);
INSERT INTO phrases (to_tsquery('simple', 'quick <-> red <-> fox'));
CREATE INDEX phrases_idx on phrases using RUM (phrase);
-- ERROR:  Indexing of phrase tsqueries isn't supported yet
akorotkov commented 6 years ago

The easiest thing, which is possible to do here, is to support phrase operator in the same way as AND operator. That would require to check actual phrase matching using heap tuple during recheck stage. But it appears that our format of additional info didn't have reserved area in order to store whether recheck is needed. I'll investigate what could be done in this area.

felixbuenemann commented 6 years ago

I think a recheck would be perfectly fine. Thanks for looking into it!