postgrespro / rum

RUM access method - inverted index with additional information in posting lists
Other
721 stars 54 forks source link

RUM Index WITH rum_tsvector_addon_ops does not work for jsonb columns #58

Open ngigiwaithaka opened 5 years ago

ngigiwaithaka commented 5 years ago

Suppose you have this table

CREATE TABLE articles (id int, object_data JSONB);

If object_data has two fields (article & timeCreatedAsFromServer)

The following returns the error _ERROR: attribute "((objectdata -> 'timeCreatedAsFromServer')::bigint)" is not found in table

CREATE INDEX article_rum_idx ON articles
  USING rum (to_tsvector('English', object_data->>'article') rum_tsvector_addon_ops,     ((object_data -> 'timeCreatedAsFromServer')::bigint))
  WITH (attach = '((object_data -> ''timeCreatedAsFromServer'')::bigint)', to = 'to_tsvector(''English'', object_data->>''article'')')

If you omit the WITH clause, it works as below!

CREATE INDEX article_rum_idx ON articles
  USING rum (to_tsvector('English', object_data->>'article') rum_tsvector_addon_ops,     ((object_data -> 'timeCreatedAsFromServer')::bigint))
ngigiwaithaka commented 5 years ago

Sorry, on running this

CREATE INDEX article_rum_idx ON articles
  USING rum (to_tsvector('English', object_data->>'article') rum_tsvector_addon_ops, ((object_data -> 'timeCreatedAsFromServer')::bigint))

You get...

ERROR: additional information attribute "to_tsvector" is not found in index

za-arthur commented 5 years ago

Hello @ngigiwaithaka ,

Thank you for the issue! Unfortunately it isn't possible to attach exression's result as a additional information nowadays.

attach and to options in WITH expression requires column names of a tables to be passed. That is why you get ERROR: attribute "((object_data -> 'timeCreatedAsFromServer')::bigint)" is not found in table.

Also it is necessary to pass attach and to options if you use rum_tsvector_addon_ops opclass. If you don't pass it you get additional information attribute "to_tsvector" is not found in index (your second case).

ngigiwaithaka commented 5 years ago

@za-arthur Thanks for prompt reply.

Any plans to have this feature included anytime soon?

za-arthur commented 5 years ago

There was no such plan until today :) It depends not only from me. We will discuss implementation of this feature.

ngigiwaithaka commented 5 years ago

:-) my bad....

I have also posting something else I have come across, hope they wouldn't be too many for you today... :-)

za-arthur commented 5 years ago

Of course not! You are welcome to create issues.

zilzila commented 11 months ago

This problem occurs not only with jsonb, but also in a simpler case - if you try to index not by the [additional] tsvector field, but by the text field, which is sad, since you actually have to duplicate a huge field, which doubles the size of an already rather big table. And the index for this redundant tsvector field is almost a threefold increase in the table as a result


Эта проблема встречается не только с jsonb, но и в более простом случае - если пытаться индексировать не по [дополнительному] полю tsvector, а по текстовому полю, что печально, так как приходится фактически дублировать огромное поле, что увеличивает размер и так немаленькой таблицы вдвое. А ещё индекс по этому избыточному полю tsvector - почти трёхкратное увеличение таблицы в итоге

create index on ... using rum (to_tsvector('lang', text_column) rum_tsvector_addon_ops, ts)
  with (attach = 'ts', to = 'to_tsvector(''lang'', text_column)');