postgrespro / rum

RUM access method - inverted index with additional information in posting lists
Other
721 stars 54 forks source link

Ad hoc search using RUM indexes #74

Open rudibroekhuizen opened 4 years ago

rudibroekhuizen commented 4 years ago

I was wondering how to set up ad hoc search using RUM indexes. For example, I have a table with three columns. I would like to search combinations of the fields. When creating a tsvector column, using all columns the content of the fields are combined, so I can not force to search only one field.

timestamp species location
2019-11-04 Buteo buteo Texel
2019-10-03 Haliaeetus albicilla Werkendam
2019-10-01 Athene noctua Zwolle

Should I create tsvector columns for column 'species', and a seperate tsvector column for 'location' and use a query like this:

SELECT *
FROM mytable
WHERE timestamp BETWEEN '2019-11-01T13:00:00.000Z' AND '2019-11-15T13:00:00.000'
AND tsv_species @@ websearch_to_tsquery('simple','Buteo')
AND tsv_location @@ websearch_to_tsquery('simple','Texel or Zwolle')
ORDER BY by observation_timestamp <=| '2019-11-15T13:00:00.000';

Or are there better options? For example, can I use multiple rum_tsvector_addon_ops arguments when creating a RUM index?

obartunov commented 4 years ago

On Fri, Nov 15, 2019 at 4:16 PM Rudi Broekhuizen notifications@github.com wrote:

I was wondering how to set up ad hoc search using RUM indexes. For example, I have a table with three columns. I would like to search combinations of the fields. When creating a tsvector column, using all columns the content of the fields are combined, so I can not force to search only one field. timestamp species location 2019-11-04 Buteo buteo Texel 2019-10-03 Haliaeetus albicilla Werkendam 2019-10-01 Athene noctua Zwolle

Should I create tsvector columns for column 'species', and a seperate tsvector column for 'location' and use a query like this:

SELECT * FROM mytable WHERE timestamp BETWEEN '2019-11-01T13:00:00.000Z' AND '2019-11-15T13:00:00.000' AND tsv_species @@ websearch_to_tsquery('simple','Buteo') AND tsv_location @@ websearch_to_tsquery('simple','Leiden') ORDER BY by observation_timestamp <=| '2019-11-15T13:00:00.000';

Or are there better options? For example, can I use multiple rum_tsvector_addon_ops arguments when creating a RUM index?

I would create tsvector as combination of species and location with separate 'labels', so you can specify 'label' in a query. Check setweight function.

some examples

select '1:1a 2:2 3:3 4:4 5:4d'::tsvector @@ websearch_to_tsquery('1:b 2'); ?column?

f (1 row)

select '1:1a 2:2 3:3 4:4 5:4d'::tsvector @@ websearch_to_tsquery('1:a 2'); ?column?

t (1 row)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/postgrespro/rum/issues/74?email_source=notifications&email_token=ABQURYUOZNSYBWWTTU3YZVTQT2OL3A5CNFSM4JN2VH6KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HZTN3HA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQURYTHS5MVX43ZLXVB3ALQT2OL3ANCNFSM4JN2VH6A .

-- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company

rudibroekhuizen commented 4 years ago

I thought weight labels could only be used using a GIN index, can they also be used with a RUM index?

obartunov commented 4 years ago

On Fri, Nov 15, 2019 at 6:28 PM Rudi Broekhuizen notifications@github.com wrote:

I thought weight labels could only be used using a GIN index, can they also be used with a RUM index?

weights are not used in index, they used to filter results after search

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company

akorotkov commented 4 years ago

BTW, in spite of GIN weights are present in RUM index posting lists and could be used for filtering within the index.

obartunov commented 4 years ago

On Sat, Nov 16, 2019 at 9:36 AM Alexander Korotkov notifications@github.com wrote:

BTW, in spite of GIN weights are present in RUM index posting lists and could be used for filtering within the index.

That would be very great !

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/postgrespro/rum/issues/74?email_source=notifications&email_token=ABQURYS4QSUSJ4H6ARI74RLQT6IF3A5CNFSM4JN2VH6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEHK4BQ#issuecomment-554610182, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQURYVZUDGNHFQESWDHUEDQT6IF3ANCNFSM4JN2VH6A .

-- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company

rudibroekhuizen commented 4 years ago

Thanks. I did some testing with weights, but it doesn't seem to work with websearch_to_tsquery, only in combination with to_tsquery. Another thing is that the set_weight setting only works when using at most 4 columns (A,B,C,D), so if I want to search more than four, it is not possible. Maybe create tsv columns per column? Concat them all to search in any field?