slusarz / dovecot-fts-flatcurve

Dovecot FTS Flatcurve plugin (Xapian)
https://slusarz.github.io/dovecot-fts-flatcurve/
GNU Lesser General Public License v2.1
38 stars 8 forks source link

fts_filter_normalizer_icu: libicu support not built in #60

Closed allexmail closed 4 months ago

allexmail commented 4 months ago

Hi, I have compiled fts-flatcurve v1.0.1 for my dovecot 2.3.21. At the ./configure stage, libicu-dev and icu-devtools are installed. I use Russian by default. In order for the search to work case-insensitive, I have to use the fts_filter = normalizer-icu option (otherwise, the search is case-sensitive only). This option works because the search really becomes case-insensitive. However, an error is written in the dovecot logs:

Mar 01 11:57:45 imap(mail@example.com)<242162><AlnFkZUSlLJ/AAAB>: Error: fts-flatcurve: fts_filter_normalizer_icu: libicu support not built in
Mar 01 11:57:45 imap(mail@example.com)<242162><AlnFkZUSlLJ/AAAB>: Error: fts: Failed to initialize backend 'flatcurve': fts-flatcurve: Invalid settings

I don't understand why fts-flatcurve asked for the libicu library at the build stage, but during use it writes that support is not built in? I want to note that the search does not work correctly if I remove the normalizer-icu option (the search becomes case-sensitive).

I am attaching part of the dovecot configuration file:

fts_autoindex=yes
 fts_autoindex_max_recent_msgs=80
 fts_index_timeout=90s
 fts = flatcurve
 fts_enforced = yes
 #fts_decoder = decode2text
 fts_autoindex_exclude = \Trash
 fts_autoindex_exclude2 = \Junk
 fts_languages = ru en
 fts_filters_en = lowercase english-possessive stopwords
 fts_filters = normalizer-icu snowball stopwords
 fts_tokenizer_generic = algorithm=simple
 fts_tokenizers = generic email-address

I'm still dealing with dovecot. I'm asking for leniency.

allexmail commented 4 months ago

I read the documentation again. The requirements say:

REQUIREMENTS
stemmer support (--with-stemmer)
Optional
icu support (--with-icu)
libtextcat support (--with-textcat)

I was trying to add the --with-icu option to configure, but an error is displayed: configure: WARNING: unrecognized options: --with-icu

Maybe I don't have some package installed. I am using Ubuntu 20.04. I am attaching the conclusion ./configure --with-dovecot=/usr/lib/dovecot --with-icu configure.txt

allexmail commented 4 months ago

I do not know what else can be done. I have configured and started indexing FTS Xapian so far. Xapian is working properly now. Including the option normalizer-icu. I can help you test and fix this error if needed.

slusarz commented 4 months ago

You need to compile Dovecot with those configure flags. They are not flatcurve requirements, they are Dovecot requirements.

Dovecot is responsible for normalization in FTS core, not in flatcurve. See https://doc.dovecot.org/settings/plugin/fts-plugin/#plugin_setting-fts-fts_filters

allexmail commented 4 months ago

I had to suffer with the installation from the source codes. Dovecote + sieve + Flatcurve FTS. Quick Installation instructions (Ubuntu 20.04, 22.04):

Download source:

  1. https://www.dovecot.org/releases/2.3/
  2. https://pigeonhole.dovecot.org/download.html
  3. https://github.com/slusarz/dovecot-fts-flatcurve

Delete the packages after saving the folder with the configuration files (usually /etc/dovecot): apt purge dovecot-*

We include the src deb in the source.list and: apt update

Installing dependencies (for ubuntu 20.04, we use libicu66 or similar versions instead of libicu70): apt install build-essential autoconf automake libtool checkinstall apt install libicu70 libicu-dev icu-devtools libghc-text-icu-dev libldap-dev libmysqlclient-dev libghc-bzlib-dev liblzma-dev liblz4-dev libexpat1-dev libstemmer-dev libexttextcat-dev pkg-config libclucene-dev gettext pandoc flex bison

Configuring dovecot: ./configure --prefix=/usr --exec-prefix=/usr --sysconfdir=/etc --localstatedir=/var --runstatedir=/run --datadir=/usr/share --with-ldap=yes --with-sql=yes --with-mysql --with-lucene --with-stemmer --with-textcat --with-icu --with-solr --with-zlib --with-bzlib --with-lzma --with-lz4 --enable-maintainer-mode

Compiling dovecot: make

If everything is fine, we assemble the deb package (when assembling, do not forget to specify a description, for example, "Dovecot with ICU"): checkinstall --install=no

If everything is OK, install the created package using dpkg (_dpkg -i packagename.deb).

Adding dovecot system users:

adduser dovenull --system --group --no-create-home --home /nonexistent --disabled-login --gecos 'Dovecot system user'
adduser dovecot --system --group --no-create-home --home /usr/lib/dovecot --disabled-login --gecos 'Dovecot system user'

We install sieve and flatcurve sequentially using the following commands (in the --with-dovecot parameter, specify the source code directory from where dovecot was collected, where the 'dovecot-config' file). Most likely, when building the pigeonhole package, in addition to the description, you must also specify the version, otherwise the deb package will not be built.

./autogen.sh #For flatcurve only
./configure --with-dovecot=../dovecot-2.3.21
make
checkinstall --install=no
dpkg -i *.deb

Adding dovecot as a service (for autostart): systemctl enable dovecot

Launch dovecot and check the log files for errors: systemctl start dovecot

to work with utf 8, we use the 'normalizer-icu' option, which is why everything had to be compiled from source codes. Part of the dovecot configuration file for fts-flatcurve :

plugin {
 ....
 fts_autoindex=yes
 fts_autoindex_max_recent_msgs=80
 fts_index_timeout=60s
 fts = flatcurve
 fts_enforced = yes
 #fts_decoder = decode2text
 fts_autoindex_exclude = \Trash
 fts_autoindex_exclude2 = \Junk
 fts_languages = ru en
 fts_filters_en = lowercase english-possessive stopwords
 fts_header_excludes = *
 fts_header_includes = From To Cc Bcc Subject
 fts_filters = normalizer-icu snowball stopwords
 fts_tokenizer_generic = algorithm=simple
 fts_tokenizers = generic email-address
....
}
mail_plugins = ... fts fts_flatcurve ...

To index (or rebuild indexes), mailboxes, run it once (as root): doveadm -v index -u '*' '*'

Please note that if you have user accounts (login/password) located in the mysql database, you need to add dovecot.mysql.conf to the config : iterate_query = SELECT username AS user FROM mailbox 'username' - is a column with the logins of your users, 'mailbox' - is a table that contains the logins of all users (usually mailboxes).

Add the following task to cron (crontab -e), for example, every 5 days (as root): 0 23 * * 5 doveadm fts optimize -A

Use and enjoy FTS. I am attaching the already assembled deb packages (v. 1.0.1 fts-f-t) for Ubuntu 20.04 and for Ubuntu 22.04 if someone does not want to install from source. Please note, you will most likely need to install some of the dependencies listed above to work correctly.

Update 20.03.2024:

Update 14.06.2024:

Of all the FTS I've tried, flatcurve is the best option. It works very smartly, not very large index files (compared to fts xapian, where index files are larger than the size of all user emails). I really liked flatcurve, thank you!

slusarz commented 4 months ago

Thanks! I added a link to your compilation example to the documentation. https://github.com/slusarz/dovecot-fts-flatcurve/commit/c6490563aee2d2fdb88f872a229c12198481bb35

allexmail commented 3 months ago

Спасибо! Я добавил ссылку на ваш пример компиляции в документацию. c649056

Let me know if there is anything else I can do to help the project. I liked flatcurve's work.