slusarz / dovecot-fts-flatcurve

Dovecot FTS Flatcurve plugin (Xapian)
https://slusarz.github.io/dovecot-fts-flatcurve/
GNU Lesser General Public License v2.1
38 stars 8 forks source link

FTS flatcurve does not index messages when search on Virtual Mailboxes #19

Closed matteomattei closed 2 years ago

matteomattei commented 2 years ago

Hello,

I'm testing FTS flatcurve plugin in order to understand if I can switch from FTS Solr to flatcurve.

In my configuration I have enabled Virtual mailboxes and to search in all folders I just SEARCH on Virtual/All folder. With Solr, if this (virtual) folder is not indexed, Dovecot starts to index it (on all real folders).

However with FTS flatcurve, when I SEARCH on Virtual/All for the first time, the indexer process does not start and the search returns no data.

Only if I manually run doveadm index -q -u user@domain.tld '*' flatcurve find messages.

Can flatcurve have the same feature as Solr for Virtual mailboxes?

Here a sample of my configuration:

namespace Virtual {
   hidden = yes
   list = no
   location = virtual:/etc/dovecot/virtual:INDEX=~/Maildir/virtual
   prefix = Virtual/
   separator = /
   subscriptions = no
}

namespace inbox {
[...]
   mailbox virtual/All {
     comment = All my messages
     special_use = \All
   }
}

# cat /etc/dovecot/virtual/All/dovecot-virtual
*
   all

# in 90-plugin.conf I have:
    fts = flatcurve
        fts_flatcurve = commit_limit=500 max_term_size=30 min_term_size=2 \
                  optimize_limit=10 rotate_size=5000 rotate_time=5000 \
                  substring_search=no
        fts_autoindex = yes
        fts_enforced = yes
    fts_index_timeout = 5s
        fts_filters = normalizer-icu snowball stopwords
        fts_filters_en = lowercase snowball english-possessive stopwords
        fts_languages = en it es de
        fts_tokenizer_generic = algorithm=simple
        fts_tokenizers = generic email-address

Thanks

slusarz commented 2 years ago

I added a test to the unit/CI tests to search virtual mailboxes, and everything is working in the test. See 09e7a8b7cb9c88333a36bceb26dce47d40d03c1c.

The test adds one message to 3 different mailboxes, and then tests searching on each mailbox (including Virtual/All).

Tracing with debug information we can confirm that the indexer correctly indexes the message when it is stored. For INBOX:

Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: Effective uid=1000, gid=1000, home=/dovecot/sdbox/user
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: fs: root=/dovecot/sdbox/user/sdbox, index=, indexpvt=, control=, inbox=, alt=
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: Mailbox INBOX: Mailbox opened
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: fts-flatcurve: Xapian library version: 1.4.11
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: fts-flatcurve(INBOX): Last UID uid=0
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: Mailbox INBOX: UID 5: Opened mail because: prefetch
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: Mailbox INBOX: Indexing UID=5
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: fts-flatcurve(INBOX): Last UID uid=0
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: Namespace : Using permissions from /dovecot/sdbox/user/sdbox: mode=0700 gid=default
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: fts-flatcurve(INBOX): Opened DB (RW; current.1642120433404068) messages=0 version=1
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: fts-flatcurve(INBOX): Indexing uid=5
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: fts-flatcurve(INBOX): Committed 1 changes to DB (RW; current.1642120433404068) in 0.000 secs
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: fts-flatcurve: Update transaction completed in 0.001 secs
Jan 14 00:33:53 indexer-worker(user)<52621><j0DwF/HE4GGXzQAAqjQh1w:sHYMGPHE4GGNzQAAqjQh1w>: Debug: Mailbox INBOX: Indexed 1 messages (UIDs 5..5)

We can verify the message is indexed properly for the other two mailboxes also:

Jan 14 00:33:53 indexer-worker(user)<52621><odyaGfHE4GGbzQAAqjQh1w:iKS0GfHE4GGNzQAAqjQh1w>: Debug: fts-flatcurve(rotatetest): Opened DB (RW; current.1642120433416781) messages=0 version=1
Jan 14 00:33:53 indexer-worker(user)<52621><odyaGfHE4GGbzQAAqjQh1w:iKS0GfHE4GGNzQAAqjQh1w>: Debug: fts-flatcurve(rotatetest): Committed 1 changes to DB (RW; current.1642120433416781) in 0.000 secs
Jan 14 00:33:53 indexer-worker(user)<52621><odyaGfHE4GGbzQAAqjQh1w:iKS0GfHE4GGNzQAAqjQh1w>: Debug: Mailbox rotatetest: Indexed 1 messages (UIDs 5..5)

Jan 14 00:33:53 indexer-worker(user)<52621><wqIXG/HE4GGfzQAAqjQh1w:KIQzG/HE4GGNzQAAqjQh1w>: Debug: fts-flatcurve(imaptest): Opened DB (RW; current.1642120433443950) messages=0 version=1
Jan 14 00:33:53 indexer-worker(user)<52621><wqIXG/HE4GGfzQAAqjQh1w:KIQzG/HE4GGNzQAAqjQh1w>: Debug: fts-flatcurve(imaptest): Committed 1 changes to DB (RW; current.1642120433443950) in 0.000 secs
Jan 14 00:33:53 indexer-worker(user)<52621><wqIXG/HE4GGfzQAAqjQh1w:KIQzG/HE4GGNzQAAqjQh1w>: Debug: Mailbox imaptest: Indexed 1 messages (UIDs 2..2)

When we open the virtual mailbox, we can see it correctly opens all 3 mailboxes:

Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Effective uid=1000, gid=1000, home=/dovecot/sdbox/user
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Namespace Virtual: type=private, prefix=Virtual/, sep=/, inbox=no, hidden=yes, list=no, subscriptions=no location=virtual:/dovecot/configs/virtual
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fs: root=/dovecot/configs/virtual, index=, indexpvt=, control=, inbox=, alt=
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Namespace inbox: type=private, prefix=, sep=/, inbox=yes, hidden=no, list=yes, subscriptions=yes location=sdbox:~/sdbox
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fs: root=/dovecot/sdbox/user/sdbox, index=, indexpvt=, control=, inbox=, alt=
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox Virtual/All: Mailbox opened
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox Virtual/All: Backend mailbox imaptest changed: UIDVALIDITY changed: 0 -> 1642120411
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox imaptest: Mailbox opened
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox Virtual/All: Backend mailbox rotatetest changed: UIDVALIDITY changed: 0 -> 1642120407
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox rotatetest: Mailbox opened
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox INBOX: Mailbox opened
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox Virtual/All: Backend mailbox INBOX changed: UIDVALIDITY changed: 0 -> 1642120403
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox imaptest: Adding field date.received to cache for the first time (uid=2)
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox rotatetest: Adding field date.received to cache for the first time (uid=5)
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox INBOX: Adding field date.received to cache for the first time (uid=5)
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox Virtual/All: Couldn't open mailbox in list index: Mailbox not found
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox Virtual/All: Mailbox opened
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox Virtual/All: conn unix:/usr/local/var/run/dovecot/indexer: Connecting
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox Virtual/All: conn unix:/usr/local/var/run/dovecot/indexer (pid=52641,uid=0): Client connected (fd=21)
Jan 14 00:33:54 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox Virtual/All: conn unix:/usr/local/var/run/dovecot/indexer (pid=52641,uid=0): Sending version handshake

Since we haven't opened the virtual mailbox before, it needs to create an index of all folder messages. For each containing mailbox, we can verify that indexing opens the mailbox, but flatcurve reports that it has already indexed all messages in the mailbox (last UID checking), so it doesn't do any new work:

Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox Virtual/All: Mailbox opened
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox imaptest: Couldn't open mailbox in list index: Storage size changed 644 != 776
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox imaptest: Mailbox opened
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox rotatetest: Couldn't open mailbox in list index: Storage size changed 1012 != 1144
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox rotatetest: Mailbox opened
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox INBOX: Mailbox opened
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox Virtual/All: Indexing UID=1
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox imaptest: UID 2: Opened mail because: access
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: fts-flatcurve: Xapian library version: 1.4.11
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: fts-flatcurve(imaptest): Opened DB (RO) messages=1 version=1 shards=2
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: fts-flatcurve(imaptest): Last UID uid=2
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox Virtual/All: Indexing UID=2
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox rotatetest: UID 5: Opened mail because: access
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: fts-flatcurve(rotatetest): Opened DB (RW; index.5833) messages=0 version=1
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: fts-flatcurve(rotatetest): Opened DB (RO) messages=1 version=1 shards=2
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: fts-flatcurve(rotatetest): Last UID uid=5
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox Virtual/All: Indexing UID=3
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox INBOX: UID 5: Opened mail because: access
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: fts-flatcurve(INBOX): Opened DB (RO) messages=1 version=1 shards=1
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: fts-flatcurve(INBOX): Last UID uid=5
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: fts-flatcurve(INBOX): Update transaction completed in 0.001 secs
Jan 14 00:33:54 indexer-worker(user)<52652><yD8w9X/V4Np/AAAB:qgidKvLE4GGszQAAqjQh1w>: Debug: Mailbox Virtual/All: Indexed 3 messages (UIDs 1..3)

Now, when the search is done on the search mailbox, it performs the search by iterating through all of the containing mailboxes:

Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve: Xapian library version: 1.4.11
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(imaptest): Opened DB (RO) messages=1 version=1 shards=2
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(imaptest): Query (body1*) matches=1 uids=2
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(rotatetest): Opened DB (RW; index.5833) messages=0 version=1
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(rotatetest): Opened DB (RO) messages=1 version=1 shards=2
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(rotatetest): Query (body1*) matches=1 uids=5
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(INBOX): Opened DB (RO) messages=1 version=1 shards=1
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(INBOX): Query (body1*) matches=1 uids=5

It correctly finds 1 message in each mailbox.

(Searching in the containing mailboxes also works, e.g. imaptest:)

Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: Mailbox imaptest: Mailbox opened
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(imaptest): Opened DB (RO) messages=1 version=1 shards=2
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(imaptest): Last UID uid=2
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(imaptest): Last UID uid=2
Jan 14 00:33:55 imap(user)<52650><yD8w9X/V4Np/AAAB>: Debug: fts-flatcurve(imaptest): Query (body1*) matches=1 uids=2

Thus, fts-flatcurve is working as expected and is correctly searching virtual mailboxes based on the core Dovecot FTS API.

I suggest enabling debug on your server and compare the output to the output above. E.g.,

mail_debug = yes
log_path = /var/log/dovecot.log
slusarz commented 2 years ago

ps, you should replace your fts_flatcurve setting with "fts_flatcurve_substring_search = no". You shouldn't need to set any of the other values (since they are all defaults).

slusarz commented 2 years ago

Based on comments on dovecot mailing list, this appears to be a misunderstanding on how doveadm rescan works in practice. There is no indication that flatcurve is broken. Closing ticket.