jeboehm / docker-mailserver

Docker Mailserver based on the famous ISPMail guide
MIT License
368 stars 97 forks source link

How best to get Full-text search in Dovecot #99

Closed agittins closed 2 years ago

agittins commented 4 years ago

I find that as-is, email searches in are slow, and often time out before returning any results. This is the case obviously for body searches, but I also experience it regularly on header searches as well. Part of my issue is that my mailstore is on networked storage (linode block storage) but ultimately the brute-force sequential scan for body searches would be problematic for me even on SSD.

I think the solution to this is setting up full-text search support in Dovecot, and to be honest, I have found that just trying to work out how best to do it in dovecot even standalone has me well confused, let alone integrating that cleanly into a containerised setup. The built-in option of fts_squat is deprecated, so we shouldn't be using that, but the official preferred options seem to be either buy the commercial version, or set up solr or lucene, none of which particularly excite me as options (cost, memory, management).

Two of the other options on that page, fts-xapian and fts-elastic look promising though, with fts-xapian perhaps being the simplest one as it looks like it is self-contained and won't need a separate search server container (I might be wrong on that). Both appear currently to be actively maintained.

So I guess my request is two-fold.

  1. What's the "best" way to get performant searching for a small/medium deployment and not a lot of resources (I am guessing fts-xapian but frankly I have no idea what I might be missing), and
  2. How does one implement that?

It would be great if it could become built-in to the core project, assuming it is low-overhead and/or can be enabled/disabled by config. Surely others have come up against this and have some solution they use.... maybe it's simply a matter of documenting it!

kklepper commented 4 years ago

Well, it is quite some time I had a look at this, but as far as I remember emails are stored as plain text files.

We have mysql for user management, why not store emails in mysql and then use sphinx-search (http://sphinxsearch.com/)?

The mail solution as it is is enough for me, so I would not have a look at it, but I use sphinx in other settings and it is stunning, as is well known.

agittins commented 4 years ago

Thanks for your thoughts.

Rather than re-invent the wheel, I was simply looking for how best to tie one of the existing dovecot fulltext-search solutions into the docker-mailserver setup.

I wasn't aware of sphinxsearch so thanks for bringing it to my attention! :-) But for this particular job, I'm really just interested in integrating one of the existing solutions rather than implementing something new altogether (and I wouldn't be keen on pumping tens of GB of emails into mysql just for searching!) - it does support flat textfiles, too - but again, it's not one of the well-tested existing FTS solutions for Dovecot, so would be a project over on dovecot rather than here.

Does everyone who cares about fast searching just give up on self-hosting and use gmail? I'm surprised there doesn't seem to be a well-trodden path here :-/

agittins commented 2 years ago

So just one month later someone posted a handy blog post showing how they set up the xapian fts plugin (which is "community developed" rather than part of dovecot core).

I see Alpine has packages for it, too: https://pkgs.alpinelinux.org/packages?name=dovecot-fts-xapian*&branch=edge

If I can get some time I'll take a look at

So there's a PR on it's way! :-)

jeboehm commented 2 years ago

Thanks for your great work. This is now released in v3.4.0

agittins commented 2 years ago

Fantastic! Glad to see this merged :-D