grosjo / fts-xapian

Dovecot FTS plugin based on Xapian
GNU Lesser General Public License v2.1
97 stars 21 forks source link

database is locked #55

Closed aletheia7 closed 4 years ago

aletheia7 commented 4 years ago

I am getting occasional errors in dovecot:

...
Sep 23 12:31:52 orange dovecot[17049]: indexer-worker(ray)<156088><ybAEGqiia1+3YQIAfYjuOg:iI6CG6iia1+4YQIAfYjuOg>: Error: FTS Xapian: Error in expunge fct (SELECT DOCID,ID from IDS ORDER BY DOCID LIMIT 5) (database is locked) : unset_box
Sep 23 12:31:52 orange dovecot[17049]: lmtp(156087): Error: FTS Xapian: adding expunge IDs (REPLACE INTO IDS (DOCID,ID) VALUES (3550,144670);) : err(5,database is locked)
...

fts-xapian: 68ea96 (master) OS: Debian Linux testing/bullseye Dovecot: v2.3.11.3 (latest)

grosjo commented 4 years ago

Can you set verbose=1 with the latest git , and share messages before and after the error ?

aletheia7 commented 4 years ago

Received One

Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630, me): FTS Xapian: Starting with partial=4 full=20 attachments=0 verbose=1
Sep 29 04:20:37 orange dovecot[1429]: lmtp(me)<47630><KMRRMIUYc18OugAAfYjuOg>: sieve: msgid=<0.1.44.FA.1D6965289EA7E42.0@omp.e.lenscrafters.com>: fileinto action: stored mail into mailbox 'Junk/Other'
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Unset box '(null)' ((null))
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Expunge UID=8971
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: adding expunge IDs : REPLACE INTO IDS (DOCID,ID) VALUES (6507,8971);
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Starting with partial=4 full=20 attachments=0 verbose=1
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Unset box '(null)' ((null))
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Get last UID of Junk/Other (ececcd190285a45da97100007d88ee3a) = 9164
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Get last UID of Junk/Other (ececcd190285a45da97100007d88ee3a) = 9164
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Start indexing 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Expunge UID=8972
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: adding expunge IDs : REPLACE INTO IDS (DOCID,ID) VALUES (6508,8972);
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Unset box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Done indexing 'Junk/Other' (ececcd190285a45da97100007d88ee3a) (1 msgs in 15 ms, rate: 66.7)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: Error: FTS Xapian: Error in expunge fct (SELECT DOCID,ID from IDS ORDER BY DOCID LIMIT 5) (database is locked) : unset_box
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Unset box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Box is empty
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Unset box '(null)' ((null))
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): Disconnect from local: Client has quit the connection (state=READY)
aletheia7 commented 4 years ago

The indexer had successfully completed an index operation before and after this problem at exactly the same minute/second.

grosjo commented 4 years ago

Hello, can you try with additional updates ?

aletheia7 commented 4 years ago

Upgraded and watching.

aletheia7 commented 4 years ago

Received One Today

https://gist.github.com/aletheia7/f7c9e38592fa34172313594908423f02#file-journal-txt-L54

grosjo commented 4 years ago

Do you have a process of expunging the mailbox in parallel to main core binary ?

aletheia7 commented 4 years ago

Yes. The following job runs daily at 1:00 am as root via a systemd.timer:

    /usr/bin/doveadm expunge -A mailbox '*' from 'no_reply @ooma.com' since 2017-06-28 before 30d header Subject 'New voicemail

The following expunge rules are defined in the dovecot local.conf file:

namespace inbox {
  ...
  mailbox Junk {
    special_use = \Junk
    auto = subscribe
    autoexpunge = 7d
  }
  mailbox Junk/Other {
    special_use = \Junk
    auto = no
    autoexpunge = 7d
  }
  mailbox "Junk E-mail" {
    special_use = \Junk   
    autoexpunge = 7d
  }
  mailbox Backups/baobab {
    auto = no
    autoexpunge = 180days
  }
  mailbox Backups/peach {
    auto = no
    autoexpunge = 180days
  }
  ...
}

The Junk and Backups/baobab mailbox expunge rules get executed when a new email arrives. They get executed often because backups occur in the evening at ~ 22:00 and, well, we get a lot of Junk/Spam.

aletheia7 commented 4 years ago

It's always one email account.

Received 5 Today

aletheia7 commented 4 years ago

Maybe the sqlite timeout should be increased. What is the timeout?

https://sqlite.org/c3ref/busy_timeout.html

Is the code using sqlite wal mode? https://www.sqlite.org/wal.html

Is the sqlite code in dovecot or fts-xapian?

grosjo commented 4 years ago

All sqlite calls are in fts-xapian. I added a "timeout" function. Please knidly check again

aletheia7 commented 4 years ago

I upgraded now. Also, it's happening to another email account too.

aletheia7 commented 4 years ago

One yesterday. Two today.

grosjo commented 4 years ago

I increased the timeout to see

aletheia7 commented 4 years ago

I used the 100 timeout on 2020-10-16. I received one "database is locked" on 2020-10-17.

I changed 100 -> 500.

slavkoja commented 4 years ago

Please, @grosjo, for what is SQLite used? Cannot this be stored directly into xapian DB?

AFAIK, the timeout is in msec. Default timeout in Python & Exim is 5000 msec, Dovecot uses 1000 msec. IMO 100 msec is too low.

grosjo commented 4 years ago

I used SQlite to handle expunges outside xapuan, as xapian does not explicitely handle concurrency.

https://getting-started-with-xapian.readthedocs.io/en/latest/concepts/concurrency.html

Does 5000 timeout solve the issue ? Do you have a better idea ?

grosjo commented 4 years ago

Actually tried with https://tartarus.org/james/xapian-apidoc/html/namespaceXapian.html#a296fe000d6d9525bb8e80f72838026a6 from recent version of Xapian

Make sense ?

slavkoja commented 4 years ago

I cannot comment xapian locking, but if i properly understand provided link, there will be no timeout, but infinite waiting for lock, i am not sure if it is good idea.

I can comment SQLite locking, but i have only basic C/C++ experiences, thus i didn't inspect your code (i will lost in int), thus some of my notes can be applied already (you simple ignore them):

Using WAL mode can improve performance, especially it will allow concurrent write/read access. Its setting is persisent and can be enabled/disabled anytime, even by CLI tool. It can little slowdown reads, but concurrent access can improve it at all. Because it is persistent, it can be enabled on DB file creation (only once) and because it is not error to enable it more time, it can be enabled ag. in time plugin initialization, to ensure enabling it for existing DB file .

grosjo commented 4 years ago

Please kindly test again, and let me know.

aletheia7 commented 4 years ago

Please kindly test again, and let me know.

I will. I am currently testing 3f7fe699dddd04aad40e26f9512eb9094341a550 with sqlite3_busy_timeout(500).

After 24 h I will test b8cad27 (latest).

aletheia7 commented 4 years ago

An sqlite3_busy_timeout of 500 works for me.

Now upgrade with the latest.

aletheia7 commented 4 years ago

Problem fixed in https://github.com/grosjo/fts-xapian/commit/fee8502ba38333bc280bd16bfd82cf37d2ba96f5.