slusarz / dovecot-fts-flatcurve

Dovecot FTS Flatcurve plugin (Xapian)
https://slusarz.github.io/dovecot-fts-flatcurve/
GNU Lesser General Public License v2.1
40 stars 8 forks source link

attachment search support config? #30

Closed pgnd closed 2 years ago

pgnd commented 2 years ago

xapian's capable of configurable attachment indexing/search, e.g.

https://xapian.org/docs/omega/overview.html https://github.com/xelkano/redmine_xapian https://wiki.bcs.rochester.edu/StatsWiki/HelpOnXapian

is dovecot-fts-flatcurve attachment capable, and configurable?

one alternative that (still?) seems to work is to enable fts_decoder,

https://doc.dovecot.org/settings/plugin/fts-plugin/#plugin_setting-fts-fts_decoder

as,

plugin {
    ...
    fts_decoder = decode2text
}

service decode2text {
    executable = script /usr/libexec/dovecot/decode2text.sh
    user = vmail
    unix_listener decode2text {
        mode = 0666
    }
}

iiuc, the ->text converted attachment is scanned/indexed by fts.

but not using Xapian/flatcurve native capabilities.

slusarz commented 2 years ago

This is not (and will not) be built into any individual Dovecot FTS library (including flatcurve). The correct place to do this is in the Dovecot core libfts - which, as you point out, is done via fts_decoder configuration. decode2text is not really intended for production use - if you want to do attachment scanning, you should use fts_tika instead.

Edit: sorry, I meant using "fts_tika" configuration instead of "fts_decoder". Tika is a general purpose text extractor, so it can be used by every FTS driver for Dovecot. It doesn't make sense to hardcode this functionality into a specific FTS driver because of this.

pgnd commented 2 years ago

then, i'll revisit tika again. i'm trying to avoid a 'fat', resource-insensive java solution for a local box -- a primary motivator to find/move to fts-flatcure from solr. solr+tika on one, local box is too heavy. admittedly, i've not tried a fts-flatcurve + tika solution to compare. yet.