grosjo / fts-xapian

Dovecot FTS plugin based on Xapian
GNU Lesser General Public License v2.1
91 stars 19 forks source link

DatabaseCorruptError - Ran off end of freelist #163

Closed Codelica closed 1 month ago

Codelica commented 1 month ago

Hello,

I'm using this plugin inside docker-mailserver (which appears to be using version 1.7.12 currently) on several servers with the following configuration (I assume $mail_plugins gets expanded by docker-mailserver):

mail_plugins = $mail_plugins fts fts_xapian

plugin {
    fts = xapian
    fts_xapian = partial=3 full=20 verbose=1

    fts_autoindex = yes
    fts_enforced = yes
}

service indexer-worker {
    # limit size of indexer-worker RAM usage, ex: 512MB, 1GB, 2GB
    vsz_limit = 4GB
}

On two of the servers this has been running great, however on a 3rd server I run into large number of DatabaseCorruptError log entries when it's processing a rather large (50k+ message) user inbox. At first I thought it may be due to the storage backend on this 3rd server (which is ZFS over remote block storage vs ZFS with local block storage) so I move the Xapian index storage to a local EXT4 volume -- however I get the same results.

I feel like (pure guess) this is memory related trying to process this large inbox (I don't believe the other servers any folders as large) but can't really justify going over the current 4GB setting.

So, I'm just looking for any thoughts or suggestions to help me debug or tune?

I see a lowmemory option I'm not using currently, so I guess it's using the 250MB default. Perhaps lowering this? I'm just not sure how that interacts I guess.

Anyway, attached is a log section (verbose=1) during the fun...

Thanks!

xapian-log.txt

grosjo commented 1 month ago

it seems your xpian db is out of order. Just delete the file and it will be rebuilt. Maybe use 1.7.13 as well

Codelica commented 1 month ago

That's basically what I've been trying. I delete all the Xapian DB's for the user and then run kick off an indexing run via doveadm index -A \* but each time corruption appears during the processing of that user's INBOX.

I've experimented with different vsz_limit up to 4GB and recently tried adding lowmemory=100 to see if it had any effect. It seems to make it a little further before the errors popped up (subjective feeling based on total DB size for them), but still ended up with DatabaseCorruptError.

If there is a chance 1.7.13 may fix this (?) I can try to manually rebuild the docker-mailserver image to check. I just wasn't sure if there is anything else I could do to try and pinpoint why/when the corruption occurs. As it is repeatable for me.

Thanks!

Codelica commented 1 month ago

FWIW I tried imposing new soft/hard file limits per another suggestion, but it didn't seem to help. However, as part of the test I did retain full logs and was able to see that I'm getting core dumps from the plugin, which I can only assume is the cause of the corruption. Posting here in case it spurs any ideas. I'm looking at what is involved to move to 1.7.13 next, but that's a little more involving.

I basically cleared out all Xapian index data for the user, and ran with the following plugin configuration:

mail_plugins = $mail_plugins fts fts_xapian

plugin {
    fts = xapian
    fts_xapian = partial=3 full=20 verbose=1

    fts_autoindex = yes
    fts_enforced = yes
}

service indexer-worker {
    # limit size of indexer-worker RAM usage, ex: 512MB, 1GB, 2GB
    vsz_limit = 2GB
}

Then kicked off indexing via docker exec mailserver doveadm index -A \* and eventually a couple core dumps popped up before I stopped the test. Log section around the first dump:

...
2024-07-01T16:15:15.959350-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: FTS Xapian: Free memory = 2218 MB
2024-07-01T16:15:15.959359-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: FTS Xapian: New doc ready to index 10275
2024-07-01T16:15:15.959364-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: FTS Xapian: Cleanup 0 : Active : DW #3325 (INBOX,/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) -  pos=12 remaining docs=0 terminated=0
2024-07-01T16:15:15.959368-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: FTS Xapian: Cleanup 1 : Active : DW #3330 (INBOX,/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) -  pos=12 remaining docs=2 terminated=0
2024-07-01T16:15:15.959374-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: FTS Xapian: Cleanup 2 : Active : DW #3329 (INBOX,/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) -  pos=12 remaining docs=0 terminated=0
2024-07-01T16:15:15.959378-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: FTS Xapian: Cleanup 3 : Active : DW #3328 (INBOX,/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) -  pos=13 remaining docs=1 terminated=0
2024-07-01T16:15:16.159609-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: FTS Xapian: Cleanup 0 : Active : DW #3325 (INBOX,/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) -  pos=12 remaining docs=0 terminated=0
2024-07-01T16:15:16.159624-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: FTS Xapian: Cleanup 1 : Active : DW #3330 (INBOX,/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) -  pos=12 remaining docs=2 terminated=0
2024-07-01T16:15:16.159628-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: FTS Xapian: Cleanup 2 : Active : DW #3329 (INBOX,/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) -  pos=12 remaining docs=0 terminated=0
2024-07-01T16:15:16.159633-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: FTS Xapian: Cleanup 3 : Active : DW #3328 (INBOX,/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) -  pos=13 remaining docs=1 terminated=0
2024-07-01T16:15:16.166027-06:00 mail xapian-docswriter: DW #828 (Family,/var/mail/thedomain.com/theuser/xapian-indexes/db_4093ea370ef97a6689d80e00773d24ba) -  Doc 2481 Size(text)=2816 Stems=640517 : Populate 2368 / 2816 Header=XBDY TextLength=97
2024-07-01T16:15:16.168016-06:00 mail xapian-docswriter: DW #828 (Family,/var/mail/thedomain.com/theuser/xapian-indexes/db_4093ea370ef97a6689d80e00773d24ba) -  Doc 2481 Size(text)=2816 Stems=640519 : Populate 2367 / 2816 Header=XBDY TextLength=7758
2024-07-01T16:15:16.247869-06:00 mail xapian-docswriter: DW #3328 (INBOX,/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) - Can't add document2 : std::exception
2024-07-01T16:15:16.247885-06:00 mail xapian-docswriter: DW #3328 (INBOX,/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) -  Retrying (/var/mail/thedomain.com/theuser/xapian-indexes/db_4bdd452fa0f87a6689d80e00773d24ba) from std::exception
2024-07-01T16:15:16.355411-06:00 mail xapian-docswriter: DW #828 (Family,/var/mail/thedomain.com/theuser/xapian-indexes/db_4093ea370ef97a6689d80e00773d24ba) -  Doc 2481 Size(text)=2816 Stems=642168 : Populate 2366 / 2816 Header=XBDY TextLength=19
2024-07-01T16:15:16.355940-06:00 mail xapian-docswriter: DW #828 (Family,/var/mail/thedomain.com/theuser/xapian-indexes/db_4093ea370ef97a6689d80e00773d24ba) -  Doc 2481 Size(text)=2816 Stems=642168 : Populate 2365 / 2816 Header=XBDY TextLength=7742
2024-07-01T16:15:16.483721-06:00 mail xapian-docswriter: DW #828 (Family,/var/mail/thedomain.com/theuser/xapian-indexes/db_4093ea370ef97a6689d80e00773d24ba) -  Doc 2481 Size(text)=2816 Stems=643001 : Populate 2364 / 2816 Header=XBDY TextLength=84
2024-07-01T16:15:16.486211-06:00 mail xapian-docswriter: DW #828 (Family,/var/mail/thedomain.com/theuser/xapian-indexes/db_4093ea370ef97a6689d80e00773d24ba) -  Doc 2481 Size(text)=2816 Stems=643001 : Populate 2363 / 2816 Header=XBDY TextLength=7695
2024-07-01T16:15:16.600761-06:00 mail xapian-docswriter: DW #828 (Family,/var/mail/thedomain.com/theuser/xapian-indexes/db_4093ea370ef97a6689d80e00773d24ba) -  Doc 2481 Size(text)=2816 Stems=643148 : Populate 2362 / 2816 Header=XBDY TextLength=76
2024-07-01T16:15:16.601712-06:00 mail xapian-docswriter: DW #828 (Family,/var/mail/thedomain.com/theuser/xapian-indexes/db_4093ea370ef97a6689d80e00773d24ba) -  Doc 2481 Size(text)=2816 Stems=643148 : Populate 2361 / 2816 Header=XBDY TextLength=7733
2024-07-01T16:15:16.630756-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<4202><M7k/ENwog2ZiEAAAdz0kug:CqkCK9wog2ZqEAAAdz0kug>: Fatal: master: service(indexer-worker): child 4202 killed with signal 11 (core dumped)
2024-07-01T16:15:16.718714-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<11621><8HjNCv4og2ZiEAAAdz0kug:3ZjMKnQqg2ZlLQAAdz0kug>: FTS Xapian: Starting with partial=3 full=20 verbose=1 lowmemory=250 MB vs freemem=3858 MB
2024-07-01T16:15:16.720705-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<11621><8HjNCv4og2ZiEAAAdz0kug:3ZjMKnQqg2ZlLQAAdz0kug>: FTS Xapian: Set box 'Junk' (c0c90b10a4f87a6689d80e00773d24ba)
2024-07-01T16:15:16.720723-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<11621><8HjNCv4og2ZiEAAAdz0kug:3ZjMKnQqg2ZlLQAAdz0kug>: FTS Xapian: 'Junk' (/var/mail/thedomain.com/theuser/xapian-indexes/db_c0c90b10a4f87a6689d80e00773d24ba) indexes do not exist. Initializing DB
2024-07-01T16:15:16.722278-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<11621><8HjNCv4og2ZiEAAAdz0kug:3ZjMKnQqg2ZlLQAAdz0kug>: FTS Xapian: Opening DB (RO) /var/mail/thedomain.com/theuser/xapian-indexes/db_c0c90b10a4f87a6689d80e00773d24ba
2024-07-01T16:15:16.725666-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<11621><8HjNCv4og2ZiEAAAdz0kug:3ZjMKnQqg2ZlLQAAdz0kug>: FTS Xapian: Set box 'Junk' (c0c90b10a4f87a6689d80e00773d24ba)
2024-07-01T16:15:16.725677-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<11621><8HjNCv4og2ZiEAAAdz0kug:3ZjMKnQqg2ZlLQAAdz0kug>: FTS Xapian: Opening DB (RO) /var/mail/thedomain.com/theuser/xapian-indexes/db_c0c90b10a4f87a6689d80e00773d24ba
2024-07-01T16:15:16.726927-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<11621><8HjNCv4og2ZiEAAAdz0kug:3ZjMKnQqg2ZlLQAAdz0kug>: FTS Xapian: Set box 'Junk' (c0c90b10a4f87a6689d80e00773d24ba)
2024-07-01T16:15:16.727019-06:00 mail dovecot: indexer-worker(theuser@thedomain.com)<11621><8HjNCv4og2ZiEAAAdz0kug:3ZjMKnQqg2ZlLQAAdz0kug>: FTS Xapian: Free memory = 3858 MB
... continues
grosjo commented 1 month ago

Hi. yes, you shall switch to 1.7.13 (regression fix)

Codelica commented 1 month ago

Thanks! Moving to 1.7.13 fixed the issue and runs much better. I'll let the docker-mailserver project know.