jaiminpan / pg_jieba

Postgresql full-text search extension for chinese
BSD 3-Clause "New" or "Revised" License
338 stars 65 forks source link

Any tutorial or documentation regarding how to use it properly (create index)? #38

Closed sgon00 closed 5 years ago

sgon00 commented 5 years ago

Hi, I don't find any tutorial or documentation regarding how to use pg_jieba properly in production. For example, how to create index? Should I use gin or rum? I heard rum is much faster, but when I search online, I don't find anyone using rum index with pg_jieba at all. Thanks a lot for your help.

jaiminpan commented 5 years ago

pg_jieba is a extension of tsvector type for chinese. I believe reference the doc for tsvector is fine. here is the official doc: https://www.postgresql.org/docs/11/textsearch-indexes.html

sgon00 commented 5 years ago

@jaiminpan thank you very much for the reference and this project.

By reading the official doc, the following is how I use pg_jieba so far:

CREATE TABLE "novelcn" (
"id" bigserial, 
"title" text, 
"body" text, 
fts_index_col tsvector, 
PRIMARY KEY ("id")
);

CREATE INDEX idx_novelcn_fts ON novelcn USING GIN (fts_index_col);

CREATE TRIGGER novelcn_tsvectorupdate
BEFORE UPDATE OR INSERT ON novelcn
FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(fts_index_col, 'public.jiebaqry', title, body);

do the search:

SELECT id, title
FROM novelcn
WHERE fts_index_col @@ to_tsquery('jiebaqry', '青云门 & 传说')
ORDER BY id DESC
LIMIT 10;

It works fine, even if I am not sure if this is the best way to do or not. I hope the above sql codes can help other new comers to use pg_jieba when they find this issue.

Btw, many online English and Chinese articles say RUM is much faster than GIN, BUT in my testing, RUM is a little bit slower than GIN. I don't know why. I will simply use GIN for now.

Thanks and closing this question.