slusarz / dovecot-fts-flatcurve

Dovecot FTS Flatcurve plugin (Xapian)
https://slusarz.github.io/dovecot-fts-flatcurve/
GNU Lesser General Public License v2.1
40 stars 8 forks source link

Fix token maxlen handling and substring indexing crash #65

Closed edieterich closed 1 month ago

edieterich commented 3 months ago

This is my attempt to fix the problems that I described in https://github.com/slusarz/dovecot-fts-flatcurve/issues/62#issuecomment-2178894698.

  1. Don't split multi-byte characters when enforcing FTS_FLATCURVE_MAX_TERM_SIZE (either because of the apparently buggy generic tokenizer, see https://github.com/slusarz/dovecot-fts-flatcurve/issues/62#issuecomment-2178894698 or because the tokenizer's maxlen is larger than FTS_FLATCURVE_MAX_TERM_SIZE , e.g. fts_tokenizer_generic = maxlen=250)
  2. Fix a crash when doing substring indexing (fts_flatcurve_substring_search = yes) because the do/while loop might be executed with size == 0 leading to an unsigned integer underflow.
slusarz commented 1 month ago

Partially merged as part of #68