grosjo / fts-xapian

Dovecot FTS plugin based on Xapian
GNU Lesser General Public License v2.1
91 stars 19 forks source link

QueryError when using 'doesn't contain' filter in Thunderbird #144

Closed reox closed 6 months ago

reox commented 7 months ago

I wanted to look for some mail which do not contain a certain address and used the "doesn't contain" filter in Thunderbird's search dialog. However, if I search with this rule I get no results back - even with random strings that should match all messages. Using the "contain" filter works fine. I checked the logs and enabled verbose logging to see this error:

imap(username)<2129634><sdflkj>: FTS Xapian: fts_backend_xapian_get_last_uid
imap(username)<2129634><sdflkj>: FTS Xapian: Set box 'INBOX' (9ed64a195d6c914ff01f000016e9c317)
imap(username)<2129634><sdflkj>: FTS Xapian: Get last UID of INBOX (9ed64a195d6c914ff01f000016e9c317) = 235898
imap(username)<2129634><sdflkj>: FTS Xapian: fts_backend_xapian_refresh
imap(username)<2129634><sdflkj>: FTS Xapian: fts_backend_xapian_release (refresh)
imap(username)<2129634><sdflkj>: FTS Xapian: Committed 'refresh' in 0 ms
imap(username)<2129634><sdflkj>: FTS Xapian: fts_backend_xapian_get_last_uid
imap(username)<2129634><sdflkj>: FTS Xapian: Set box 'INBOX' (9ed64a195d6c914ff01f000016e9c317)
imap(username)<2129634><sdflkj>: FTS Xapian: Get last UID of INBOX (9ed64a195d6c914ff01f000016e9c317) = 235898
imap(username)<2129634><sdflkj>: FTS Xapian: fts_backend_xapian_lookup
imap(username)<2129634><sdflkj>: FTS Xapian: Set box 'INBOX' (9ed64a195d6c914ff01f000016e9c317)
imap(username)<2129634><sdflkj>: Error: FTS Xapian: xapian_query QueryParserError - (null)
imap(username)<2129634><sdflkj>: FTS Xapian: QUery 'NOT( to:"foo" )' -> 0 results
imap(username)<2129634><sdflkj>: FTS Xapian: 0 results in 1 ms

Apparently the query produces this QueryParserError?

My configuration is the following:

plugin {
    fts = xapian
    fts_xapian = partial=3 full=20 verbose=0
    fts_autoindex = yes
    fts_enforced = yes
    fts_autoindex_exclude = \Trash
    fts_autoindex_exclude2 = \Spam
    fts_autoindex_exclude3 = \Junk
}
service indexer-worker {
    vsz_limit = 2G
}

Do I miss something in the configuration? Or is this that maybe already fixed, but my version is too old? I use dovecot-fts-xapian 1.4.9a-1+deb11u1 on debian bullseye, but also checked now with 1.5.5-1~bpo11+1 from backports and have the same behaviour.

grosjo commented 7 months ago

Yes, I understand the issue, will make a new release

grosjo commented 7 months ago

Please try latest git

reox commented 7 months ago

Thanks for the fix! I can try to build it and find a test-machine to try the code... But that could unfortunately take a while.

grosjo commented 7 months ago

@reox is latest Git OK for you ?

reox commented 6 months ago

I built now the latest master and copied the .so file to my server. However, it does still not really work... I used two testcases:

The current master also does not find full addresses anymore. For example, If I search for from: somemail@somedomain.com, then I get no results, even if there is a matching mail. It seems to be an issue if I include the at sign. If I search for from: @somedomain.com it does also not work, while searching for from: somedomain.com gives me the mail. I checked that with 1.5.5-1~bpo11+1 from debian and there I can find mails if I search for the full address, however, the log looks different:

master:

QUery 'from:"somemail_somedomain_com"' -> 0 results

1.5.5:

QUery '(from:"somemail" AND (from:"somedomain_com" OR from:"somedomain"))' -> 2 results
grosjo commented 6 months ago

Kindly try again with latest git.

To fix your empty search comparing with 1.5.5, please do a "doveadm index -A -q" on the considered mail account to reset the index, as the logic has change in between on special character like @ et $#

reox commented 6 months ago

That one crashes if I use the negated query:

Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: fts_backend_xapian_get_last_uid
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: Set box 'INBOX' (9ed64a195d6c914ff01f000016e9c317)
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: Index path = /srv/vmail/example.com/me/xapian-indexes
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: Get last UID of INBOX (9ed64a195d6c914ff01f000016e9c317) = 236098
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: fts_backend_xapian_refresh
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: fts_backend_xapian_release (refresh)
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: fts_backend_xapian_release (refresh) - done
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: fts_backend_xapian_get_last_uid
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: Set box 'INBOX' (9ed64a195d6c914ff01f000016e9c317)
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: Get last UID of INBOX (9ed64a195d6c914ff01f000016e9c317) = 236098
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: fts_backend_xapian_lookup
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: Set box 'INBOX' (9ed64a195d6c914ff01f000016e9c317)
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: FTS Xapian: fts_backend_xapian_query (NOT ( from:"gmail_com"))
Feb  1 12:51:16 mailsrv dovecot: imap: Error: terminate called after throwing an instance of 'Xapian::QueryParserError'
Feb  1 12:51:16 mailsrv dovecot: imap(me@example.com)<116375><WJztnFAQZiQgAQhxAl04MnkEmEfzWNRW>: Fatal: master: service(imap): child 116375 killed with signal 6 (core dumps disabled - https://dovecot.org/bugreport.html#coredumps)

I did a re-index before searching now!

grosjo commented 6 months ago

Oups sorry, forgot to push the latest bit

can yo try again ?

reox commented 6 months ago

It works much better now, the single NOT works very well now! I tried a more comprehensive search over my full mailbox, where it crashed again, with a different error:

Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: Start indexing 'Spam' (46eaa40c79552d54b01700008781c94f)
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: Opening DB (RW) /srv/vmail/example.com/me/xapian-indexes/db_46eaa40c79552d54b01700008781c94f
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: Opening DB (RW) /srv/vmail/example.com/me/xapian-indexes/db_46eaa40c79552d54b01700008781c94f (1682 docs stored): Done
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_index_hdr
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_query (uid:"1890")
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: Ngram(XMID) -> 1 items (total 0 KB)
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_update_unset_build_key with 1683 docs in the index
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_index_hdr
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_query (uid:"1890")
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: Ngram(A) -> 115 items (total 0 KB)
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_update_unset_build_key with 1683 docs in the index
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_index_hdr
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_query (uid:"1890")
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: Ngram(XTO) -> 26 items (total 0 KB)
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_update_unset_build_key with 1683 docs in the index
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_index_hdr
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_query (uid:"1890")
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: Ngram(S) -> 206 items (total 1 KB)
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_update_unset_build_key with 1683 docs in the index
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_query (uid:"1890")
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_update_unset_build_key with 1683 docs in the index
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_query (uid:"1890")
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_update_unset_build_key with 1683 docs in the index
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_update_set_mailbox
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: Unset box 'Spam' (46eaa40c79552d54b01700008781c94f)
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_oldbox
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: Done indexing 'Spam' (46eaa40c79552d54b01700008781c94f) (1 msgs in 557 ms, rate: 1.8)
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_oldbox - done
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_release (unset_box)
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian - Thread Closing
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_release (unset_box) - done
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_update_deinit (/srv/vmail/example.com/me/xapian-indexes)
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_release (update_deinit)
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: FTS Xapian: fts_backend_xapian_release (update_deinit) - done
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: Panic: Leaked t_pop() call
Feb  2 07:55:41 mailsrv dovecot: indexer: Error: Indexer worker disconnected, discarding 1 requests for me@example.com
Feb  2 07:55:41 mailsrv dovecot: indexer-worker(me@example.com)<239369><...>: Fatal: master: service(indexer-worker): child 239369 killed with signal 6 (core dumps disabled - https://dovecot.org/bugreport.html#coredumps)
Feb  2 07:55:41 mailsrv dovecot: imap(me@example.com)<239356><D8FkmGAQgIAgAQhxAl04MnkEmEfzWNRW>: Error: indexer failed to index mailbox Spam

I tested with some longer searches like to xyz and not from abc and not from sdf but here it looks like, the search only works for the first three items? With every item I add, I get then the same search result.

grosjo commented 6 months ago

Ok, the crash is some other issue, and I beleive to have fixed it with the latest git

When you do a search like you said, can you show me the log ?

basil-conto commented 6 months ago

Thanks, but commit 3096d1fb6d4e8a51749eddc9919cd63d82b14a47 has reintroduced a segfault during doveadm index -A \*:

...
doveadm(uni): Info: FTS Xapian: Unset box 'INBOX' (6c34e70b1a9a85649f890000d99394be)
doveadm(uni): Info: FTS Xapian: fts_backend_xapian_oldbox
doveadm(uni): Info: FTS Xapian: Done indexing 'INBOX' (6c34e70b1a9a85649f890000d99394be) (12 msgs in 770 ms, rate: 15.6)
doveadm(uni): Info: FTS Xapian: fts_backend_xapian_oldbox - done
doveadm(uni): Info: FTS Xapian: fts_backend_xapian_release (unset_box)
doveadm(uni): Info: FTS Xapian (Not threaded): Commit & Closing (/var/mailindex/uni/xapian-indexes/db_6c34e70b1a9a85649f890000d99394be) starting (INBOX) : /usr/bin/doveadm
doveadm(uni): Info: FTS Xapian (Not threaded): Commit & Closing (/var/mailindex/uni/xapian-indexes/db_6c34e70b1a9a85649f890000d99394be) finishing (INBOX) : 95 (old) vs 107 (new)
doveadm(uni): Info: FTS Xapian (Not threaded) : Commit & Close (/var/mailindex/uni/xapian-indexes/db_6c34e70b1a9a85649f890000d99394be) (INBOX) - Done in 60 ms
Segmentation fault
grosjo commented 6 months ago

Yes, try again with latest git

basil-conto commented 6 months ago

Seems to be working again, thanks!

reox commented 6 months ago

I tested again with the latest master. Here is an example where it works with 3 items connected by ANDs but then the 4th gives the same result again, although there are mails that should have been filtered out. Do you need more verbose log output?

Feb  3 15:45:43  FTS Xapian: fts_backend_xapian_get_last_uid
Feb  3 15:45:43  FTS Xapian: Set box 'Archives.2015' (6e87722ab61ad654893500008781c94f)
Feb  3 15:45:43  FTS Xapian: Get last UID of Archives.2015 (6e87722ab61ad654893500008781c94f) = 10338
Feb  3 15:45:43  FTS Xapian: fts_backend_xapian_refresh
Feb  3 15:45:43  FTS Xapian: fts_backend_xapian_release (refresh)
Feb  3 15:45:43  FTS Xapian: fts_backend_xapian_release (refresh) - done
Feb  3 15:45:43  FTS Xapian: fts_backend_xapian_get_last_uid
Feb  3 15:45:43  FTS Xapian: Set box 'Archives.2015' (6e87722ab61ad654893500008781c94f)
Feb  3 15:45:43  FTS Xapian: Get last UID of Archives.2015 (6e87722ab61ad654893500008781c94f) = 10338
Feb  3 15:45:43  FTS Xapian: fts_backend_xapian_lookup
Feb  3 15:45:43  FTS Xapian: Set box 'Archives.2015' (6e87722ab61ad654893500008781c94f)
Feb  3 15:45:43  FTS Xapian: fts_backend_xapian_query (to:"example_com" AND NOT ( from:"willhaben_at") AND NOT ( from:"hetzner_de"))
Feb  3 15:45:43  FTS Xapian: QUery 'to:"example_com" AND NOT ( from:"willhaben_at") AND NOT ( from:"hetzner_de")' -> 40 results
Feb  3 15:45:43  FTS Xapian: 40 results in 16 ms
Feb  3 15:45:52  FTS Xapian: fts_backend_xapian_get_last_uid
Feb  3 15:45:52  FTS Xapian: Set box 'Archives.2015' (6e87722ab61ad654893500008781c94f)
Feb  3 15:45:52  FTS Xapian: Get last UID of Archives.2015 (6e87722ab61ad654893500008781c94f) = 10338
Feb  3 15:45:52  FTS Xapian: fts_backend_xapian_refresh
Feb  3 15:45:52  FTS Xapian: fts_backend_xapian_release (refresh)
Feb  3 15:45:52  FTS Xapian: fts_backend_xapian_release (refresh) - done
Feb  3 15:45:52  FTS Xapian: fts_backend_xapian_get_last_uid
Feb  3 15:45:52  FTS Xapian: Set box 'Archives.2015' (6e87722ab61ad654893500008781c94f)
Feb  3 15:45:52  FTS Xapian: Get last UID of Archives.2015 (6e87722ab61ad654893500008781c94f) = 10338
Feb  3 15:45:52  FTS Xapian: fts_backend_xapian_lookup
Feb  3 15:45:52  FTS Xapian: Set box 'Archives.2015' (6e87722ab61ad654893500008781c94f)
Feb  3 15:45:52  FTS Xapian: fts_backend_xapian_query (to:"example_com" AND NOT ( from:"willhaben_at") AND NOT ( from:"hetzner_de") AND NOT ( from:"wienerlinien_at"))
Feb  3 15:45:52  FTS Xapian: QUery 'to:"example_com" AND NOT ( from:"willhaben_at") AND NOT ( from:"hetzner_de") AND NOT ( from:"wienerlinien_at")' -> 40 results
Feb  3 15:45:52  FTS Xapian: 40 results in 2 ms
grosjo commented 6 months ago

I htink I found the bug. can you test again ?

reox commented 6 months ago

I htink I found the bug. can you test again ?

yes! That looks very good now! Thank you!

I just noticed one thing that is odd: If I search for domains it works. But if I search for a full mail adress, these are not excluded. For example: NOT ( from: gmail_com) -> works. NOT (from: someone_gmail_com) -> shows all results including someone@gmail.com. Do I need to re-index again maybe?

grosjo commented 6 months ago

Yeah :)

grosjo commented 6 months ago

I just noticed one thing that is odd: If I search for domains it works. But if I search for a full mail adress, these are not excluded. For example: NOT ( from: gmail_com) -> works. NOT (from: someone_gmail_com) -> shows all results including someone@gmail.com. Do I need to re-index again maybe?

Aie, need to dig into that

grosjo commented 6 months ago

@reox can you re-index the mailbox you are testing with, and do your search again ?

reox commented 6 months ago

Okay, so I did

doveadm fts rescan -u me@example.com
doveadm index -u me@example.com '*'

But now it looks like the search is broken? If I search for domains or full addresses, I simply find nothing or all messages in that particular mailbox. I just checked, that is only for domains and addresses - regular text search works fine.

grosjo commented 6 months ago

Thank you for all this

Please re-index with latest git

reox commented 6 months ago

Just compiled and tested and it looks good now! Thank you very much!

grosjo commented 6 months ago

Great !

reox commented 6 months ago

I'm so sorry, but it looks like the lookup for domains is still broken. If I search for someone@gmail.com I get correct results, but gmail.com yields an empty search result :/

grosjo commented 6 months ago

Can you trry with latest git ?

reox commented 6 months ago

Do I need to re-index after installation?

At the moment, I get this error:

imap(me@example.com)<1715630><abc>: Error: FTS Xapian: Can not open RO index (Archives.2024) /srv/vmail/example.com/me/xapian-indexes/db_28a0162648809265baff1700e0122c82 : DatabaseOpeningError - Permission denied
indexer-worker(me@example.com)<1715636><...>: Error: FTS Xapian: Can not open RO index (Archives.2024) /srv/vmail/example.com/me/xapian-indexes/db_28a0162648809265baff1700e0122c82 : DatabaseOpeningError - Permission denied
indexer-worker(me@example.com)<1715636><...>: Error: FTS Xapian: Can not open RO index (Archives.2024) /srv/vmail/example.com/me/xapian-indexes/db_28a0162648809265baff1700e0122c82 : DatabaseOpeningError - Permission denied
indexer-worker(me@example.com)<1715636><...>: Error: FTS Xapian: Can't open Xapian DB (RW) (Archives.2024) /srv/vmail/example.com/me/xapian-indexes/db_28a0162648809265baff1700e0122c82 : DatabaseOpeningError - Permission denied
indexer-worker(me@example.com)<1715636><...>: Error: FTS Xapian: Buildmore: Can not open db
indexer-worker(me@example.com)<1715636><...>: Error: Mailbox Archives.2024: Mail search failed: Internal error occurred. Refer to server log for more information. [2024-02-11 13:08:56]
indexer-worker(me@example.com)<1715636><...>: FTS Xapian: Done indexing 'Archives.2024' (28a0162648809265baff1700e0122c82) (1 msgs in 3 ms, rate: 333.3)
indexer-worker(me@example.com)<1715636><...>: Error: Mailbox Archives.2024: Transaction commit failed: FTS transaction commit failed: transaction context (attempted to index 1 messages (UIDs 1..1))
imap(me@example.com)<1715630><abc>: Error: indexer failed to index mailbox Archives.2024

I'll start a complete re-index now... could take some time.

grosjo commented 6 months ago

You need to set properly the ownership (as configured in your dovecot.conf) of the files in /srv/vmail/example.com/, install the upgrade and indeed re-index

reox commented 6 months ago

I never saw this error before, thus I suspect it has something to do with the version change? The ownership is also correct.

reox commented 6 months ago

reindexing has finished and I can search for domains again and also no access denied errors!