magda-io / magda

A federated, open-source data catalog for all your big data and small data
https://magda.io
Apache License 2.0
496 stars 93 forks source link

Registry records manager: Able to search record by keywords & records list #3480

Closed t83714 closed 10 months ago

t83714 commented 11 months ago

Registry Records Manager: Able to search record by keywords & records list

To make the registry record manager more useful for people, we will need offer the basic keyword based searching functionality.

It should come with a record list and quick preview feature as well.

Technical notes:

t83714 commented 10 months ago

Based on tests, full text search feature seems provide reasonable result (at the same level as GIN with pg_trgm extension) but offer:

Index creation:

CREATE INDEX idx_data_full_text ON recordaspects 
USING GIN ((
 to_tsvector('english', recordid) ||
 to_tsvector('english', aspectid) ||
 jsonb_to_tsvector('english', data, '["string"]')
))

Query would be similar to:

select * from recordaspects where (
 to_tsvector('english', recordid) ||
 to_tsvector('english', aspectid) ||
 jsonb_to_tsvector('english', data, '["string"]')
) @@ websearch_to_tsquery('english', 'my search keywords')

by default, we will use 'english' search config for indexing & searching but allow users to change via helm chart config. helm chart config changes won't impact existing index created. Users are required to re-created the index with SQL above (with config name changed)

t83714 commented 10 months ago

To make sure recordId & aspectId can be "exactly" matched via the full text search query API interface, we need to alter the index creation as:

CREATE INDEX idx_data_full_text ON recordaspects 
USING GIN (jsonb_to_tsvector('english'::regconfig, data, '["string"]'))

And Search would be similar to :

select * from recordaspects where (
 jsonb_to_tsvector('english'::regconfig, data, '["string"]') @@ websearch_to_tsquery('english'::regconfig, 'my search keywords')
OR recordid = 'my search keywords'
OR aspectid = 'my search keywords'
)
t83714 commented 10 months ago

closed via https://github.com/magda-io/magda/pull/3485