grosjo / fts-xapian

Dovecot FTS plugin based on Xapian
GNU Lesser General Public License v2.1
97 stars 21 forks source link

Error: terminate called after throwing an instance of 'std::bad_alloc' #36

Closed swdee closed 4 years ago

swdee commented 4 years ago

When building an index on an existing mailbox we get the following error on one particular mailbox.

Feb  7 07:40:16 imap dovecot[15273]: indexer-worker: Error: terminate called after throwing an instance of 'std::bad_alloc'
Feb  7 07:40:16 imap dovecot[15273]: indexer-worker: Error:   what():  std::bad_alloc
Feb  7 07:40:18 imap dovecot[15273]: indexer: Error: Indexer worker disconnected, discarding 6 requests for shanon_globe
Feb  7 07:40:18 imap dovecot[15273]: indexer-worker(shanon_globe)<15281><mET3HIJ1PF6xOwAAfpyZ+Q>: Fatal: master: service(indexer-worker): child 15281 killed with signal 6 (core dumped)

This mailbox is only one third the size of some other mailboxes that complete an index build successfully.

We have vzs_limit set as;

service indexer-worker {
  vsz_limit = 1280 M
}

We also compiled the fts-xapian plugin with XAPIAN_COMMIT_LIMIT = 250.

Attached are the backtraces from the coredump.

bt.log bt-full.log

swdee commented 4 years ago

In further testing I increased vsz_limit = 4G and it has same killed with signal 6 result.

I then manually indexed each folder in the mailbox instead of the mailbox user account as a whole. This resulted in the apparent memory leak to occur on the sent-mail folder which has a smaller number of mails in it compared to other folders that successfully build an index.

Mailbox Folder Mail Count Index Build Result
msg_hold2 11074 ok
msg_hold 12726 ok
b_globe 804 ok
sent-mail 7194 coredumps
Trash 617 ok
Trash/junkles 13 ok
Sent 0 ok
msg_hold3 5045 ok
INBOX 4734 ok

Is there some way we can manually build the index but get a progress report for each individual email being indexed so we can see which one is causing the memory leak?

swdee commented 4 years ago

I have found the email, there are two which have a file attachment containing perl/cgi code and are causing the memory leak when indexing the sent-mail folder.

As a work around I have removed those two emails.

grosjo commented 4 years ago

Which version of Xapian are you using ? Can you forward me content of the email causing memory leak ?

swdee commented 4 years ago

We are using the packages available on Fedora 31, so xapian-core-1.4.13-2.

I can't post the email here as it is commercially sensitive. If you post your public SSH key I will setup a node with the environment and problem email so you can investigate.

grosjo commented 4 years ago

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7fM/a/DONp/IduuotIm/bsie1V7Mn6J23Ecr2zYID+JJJdm4AjwI9IXfFOMD4kD55g/bBCF4x1FdMcouBlR/11PSWQx+4r/eBQZg7i8R2hB2rGG9M44zNJLyZEz5fDtaBfz3gSiKBHD0dtGqAps/nuILknxgvDbCvWu43Me8VuR0tXLidG9EQcI/fOvGoykTxS9JEBYxMyIN7kslBClZnyCgZhJI24UR4EuphR9zxRXuPKbAu0Fxh/+q8tqkBqHXz5I97OZ8Bpdsl2HOFNVM/VHU8gX1I/5iVXqFaicEPSgJCugbzDWS+HS5dGFhFpmMtkNIfU06wiVB/chykclAz joan@gjlaptop

swdee commented 4 years ago

I have setup a temporary node demonstrating the problem so you can take a look. Make any changes you want on this node and don't worry about breaking anything as we will delete it once done.

ssh using your key to root@45.79.78.191

I have built the fts-xapian plugin using the following commands

cd ~/build
git clone https://github.com/grosjo/fts-xapian
cd ~/build/fts-xapian/
autoreconf -vi
./configure --with-dovecot=/usr/lib64/dovecot
make
make install

# then restart dovecot for changes to take effect
systemctl restart dovecot

The mailbox containing the two problem emails which causes the memory leak are at /home/vmail/shanon_test/mdbox/

To reproduce the coredump and problem take the following steps;

  1. First delete existing xapian indexes

    systemctl stop dovecot
    rm -rf /home/vmail/shanon_test/mdbox/xapian-indexes
    systemctl start dovecot
  2. Manually index the "corrupt" folder containing the two emails.

    doveadm index -u shanon_test -q corrupt
  3. Wait around 3 minutes and the index-worker eventually exhausts vsz_limit default 256mb memory limit. Note I have not increased this limit as it takes to long to exhaust and is not necessary for indexing only 2 emails. I use top to monitor when the index-worker has stopped/crashed.

  4. To get crash log

    
    $ tail -c 5000 /var/log/maillog  

Feb 8 23:47:59 localhost dovecot[852]: indexer-worker: Error: terminate called after throwing an instance of 'std::bad_alloc' Feb 8 23:47:59 localhost dovecot[852]: indexer-worker: Error: what(): std::bad_alloc Feb 8 23:48:01 localhost dovecot[852]: indexer: Error: Indexer worker disconnected, discarding 1 requests for shanon_test Feb 8 23:48:01 localhost dovecot[852]: indexer-worker(shanon_test)<861>: Fatal: master: service(indexer-worker): child 861 killed with signal 6 (core dumped)



5. The coredump files are written to `/var/lib/systemd/coredump/`

Furthermore I have found another mailbox that causes a crash, I haven't investigated properly yet, but it seems to be crashing on a email with a PDF attachment.   Is there a way to configure indexing so it only indexes the email body and excludes indexing the attachments?
grosjo commented 4 years ago

I added an option for indexing or not the attachments

And for the attachments, indexing only text ones.

grosjo commented 4 years ago

Feb 9 16:19:03 localhost dovecot[17497]: indexer-worker(shanon_test)<17508>: Skipping part of type 'text/plain' and disposition 'inline; filename="webupgrade-medtara-1.2.29"' Feb 9 16:19:03 localhost dovecot[17497]: indexer-worker(shanon_test)<17508>: FTS Xapian: Done indexing 'corrupt' (2 msgs in 25 ms, rate: 80.0) Feb 9 16:19:03 localhost dovecot[17497]: indexer-worker(shanon_test)<17508>: Indexed 2 messages in corrupt (UIDs 1..2)

swdee commented 4 years ago

Nice fix! I will pull your latest code and run it on the full mailboxes on the production server we are setting up and let you know how it goes.

swdee commented 4 years ago

We have done multiple runs on the full mailboxes from a production server and everything indexes nicely now with attachments=0. Thanks for the fix and additional options for disabling attachment indexing.

Search speed is good with mailboxes in the 20k-30k message range returning in 0.01 seconds.