slusarz / dovecot-fts-flatcurve

Dovecot FTS Flatcurve plugin (Xapian)
https://slusarz.github.io/dovecot-fts-flatcurve/
GNU Lesser General Public License v2.1
38 stars 8 forks source link

Inaccurate results while searching for a phrase in subject #43

Closed ss-17 closed 1 year ago

ss-17 commented 1 year ago

Hi,

I had been using the lucene FTS plugin since a decade now and it has done me well. Thought of upgrading to the new & current stuff and came across this flatcurve plugin of yours which seems very promising (xapian on the other hand was creating indexes larger than my mailboxes themselves). I am using following configuration in dovecot.conf:

fts = flatcurve
fts_filters_en = lowercase english-possessive stopwords
fts_languages = en
fts_tokenizers = generic email-address
fts_autoindex = no
fts_enforced = yes

A search command like this:

doveadm -D search -u john@doe.com mailbox INBOX SUBJECT "/home/johndoe/render.php"

should show the messages with subject: "CRON: /home/johndoe/render.php OK" but produces a lot of extra undesired results and I think the second line in this debug output indicates the reason:

May 23 07:44:13 doveadm(john@doe.com): Debug: fts-flatcurve(INBOX): Query (hdr_subject:/home/johndoe/render.php*) matches=0 uids=
May 23 07:44:13 doveadm(john@doe.com): Debug: fts-flatcurve(INBOX): Query (hdr_subject:php* AND hdr_subject:render* AND hdr_subject:johndoe* AND hdr_subject:home*) matches=272 uids=67041,67085,67188,67223,67257,67290,67323,67355,67395,67564,67770,67817,67863,67985,68819,69512,69572,69635,69737,70017,70058,70086,70125,70147,70191,70296,70304,70331,70340,70350,70354,70375,70407,70417,70427,70449,70499,70521:70522,70535:70550,70555,70561:70563,70591,70597:70599,70662,70685,70702,70708,70718:70719,70724,70727:70728,70730:70733,70735,70746:70747,70754,70775,70777,70794,70811:70812,70822,70866,70942,70948,70971,71017,71021,71040,71042,71075,71079,71084,71113,71128:71129,71131,71152,71160,71184,71188,71208,71214,71225,71255,71269,71297,71300,71331,71375,71422,71449,71457,71467,71469,71495,71515,71605,71626,71632,71649,71672,71681:71682,71689,71692,71699,71716,71757,71770,71777,71782:71785,71790,71795,71797,71814,71818:71819,71828,71838:71842,71845,71859:71860,71937,71947,71954,71960,71963:71964,71977,71990,72014,72021:72022,72030,72034:72042,72045:72046,72049,72056,72061,72063,72073:72074,72083,72088,72090,72092,72101,72108,72129,72131:72132,72134,72136:72140,72159,72163,72172:72173,72186,72212,72218:72223,72237,72239,72246,72267,72288,72387,72410,72446,72469,72476:72477,72514,72541,72543,72568:72569,72572:72574,72598,72604,72606,72609,72644,72674,72687,72691,72694,72734,72772,72791,72797,72799,72803,72832:72833,72835:72841,72856:72857,72866:72867,72873:72874,72901,72930,72938,72948,72960,72965,72976,73018,73037,73071,73081,73116,73158,73249,73307,73352,73392,73466,73533,73601,73670,73733,73775,73784:73786,73804,73807,73811,73815,73819,73823,73825,73831,73842,73846,74005,74199,74390,74540,74684,74854,75017,75192,75354,75525,75710,75839:75843,75845,75903,75984:75985,76091,76263,76447,76624,76816,76989,77091:77092,77097,77119,77155,77293,77460,77608,77761,77908,78066,78218,78393,78400:78401,78522:78523,78560,78728,78921,79104,79298,79504,79555,79898,80027,80031:80032,80034:80035,80037,80056,80071,80073,80077:80079,80082:80084,80086,80089

I tried rebuilding the indexes with fts_flatcurve_substring_search = yes too but that didn't change anything. It works as expected with lucene plugin because in that case header search is performed via dovecot indexes instead of FTS. May be I am not doing something right in configuring this new FTS? Will really appreciate some pointer here.

Thanks, Sam

ss-17 commented 1 year ago
slusarz commented 1 year ago

See https://dovecot.org/mailman3/archives/list/dovecot@dovecot.org/thread/RWXRCAJ4WDUJVJCB6YFRCTW5AX5VBDRY/

This is working by design. The behavior is controlled in core, so there's nothing flatcurve can do about this. Further discussions on this behavior would need to take place in the context of Dovecot core. Closing ticket.