micado-eu / MICADO

Main repository for the EU funded MICADO project
https://www.micadoproject.eu/
European Union Public License 1.2
10 stars 1 forks source link

Postgresql multi language search support #99

Open webdevotion opened 3 years ago

webdevotion commented 3 years ago

Textsearch in Postgresql

Remarks

It's possible that adding additional languages has an impact on the requirements of the database. Please benchmark the performance before and after adding languages.

Introduction to configurations

https://www.postgresql.org/docs/current/textsearch-intro.html#TEXTSEARCH-INTRO-CONFIGURATIONS

Default language configs

You can run \dF in the PostgreSQL prompt to check supported languages for your install:

postgres=# \dF

For a standard installation the installed languages are:

simple
arabic
danish
dutch
english
finnish
french
german
hungarian
indonesian
irish
italian
lithuanian
nepali
norwegian
portuguese
romanian
russian
spanish
swedish
tamil
turkish

Create new, custom configs

Notice that while many Indo-European languages are available, such > as English, German, Spanish, and Russian, there are some remarkable > misses out of this family group, such as Chinese and Japanese.

Chinese

A possible solution can be zhparser:

There is an extension to support Chinese: https://github.com/amutu/zhparser/

With an interesting gist here to use it in a Dockerfile: https://gist.github.com/ciiiii/0e9f3ffcd1b33b087fc5d5b02bf72bce

Other languages

Other custom configurations can be created via CREATE TEXT SEARCH CONFIGURATION: https://www.postgresql.org/docs/current/sql-createtsconfig.html

A possible quick start can be found here: https://stackoverflow.com/a/56889254