Closed aletheia7 closed 4 years ago
Can you set verbose=1 with the latest git , and share messages before and after the error ?
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630, me): FTS Xapian: Starting with partial=4 full=20 attachments=0 verbose=1
Sep 29 04:20:37 orange dovecot[1429]: lmtp(me)<47630><KMRRMIUYc18OugAAfYjuOg>: sieve: msgid=<0.1.44.FA.1D6965289EA7E42.0@omp.e.lenscrafters.com>: fileinto action: stored mail into mailbox 'Junk/Other'
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Unset box '(null)' ((null))
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Expunge UID=8971
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: adding expunge IDs : REPLACE INTO IDS (DOCID,ID) VALUES (6507,8971);
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Starting with partial=4 full=20 attachments=0 verbose=1
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Unset box '(null)' ((null))
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Get last UID of Junk/Other (ececcd190285a45da97100007d88ee3a) = 9164
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Get last UID of Junk/Other (ececcd190285a45da97100007d88ee3a) = 9164
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Start indexing 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Expunge UID=8972
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: adding expunge IDs : REPLACE INTO IDS (DOCID,ID) VALUES (6508,8972);
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Unset box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Done indexing 'Junk/Other' (ececcd190285a45da97100007d88ee3a) (1 msgs in 15 ms, rate: 66.7)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: Error: FTS Xapian: Error in expunge fct (SELECT DOCID,ID from IDS ORDER BY DOCID LIMIT 5) (database is locked) : unset_box
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Unset box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Box is empty
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Unset box '(null)' ((null))
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): Disconnect from local: Client has quit the connection (state=READY)
The indexer had successfully completed an index operation before and after this problem at exactly the same minute/second.
Hello, can you try with additional updates ?
Upgraded and watching.
Do you have a process of expunging the mailbox in parallel to main core binary ?
Yes. The following job runs daily at 1:00 am as root via a systemd.timer:
/usr/bin/doveadm expunge -A mailbox '*' from 'no_reply @ooma.com' since 2017-06-28 before 30d header Subject 'New voicemail
The following expunge rules are defined in the dovecot local.conf file:
namespace inbox {
...
mailbox Junk {
special_use = \Junk
auto = subscribe
autoexpunge = 7d
}
mailbox Junk/Other {
special_use = \Junk
auto = no
autoexpunge = 7d
}
mailbox "Junk E-mail" {
special_use = \Junk
autoexpunge = 7d
}
mailbox Backups/baobab {
auto = no
autoexpunge = 180days
}
mailbox Backups/peach {
auto = no
autoexpunge = 180days
}
...
}
The Junk and Backups/baobab mailbox expunge rules get executed when a new email arrives. They get executed often because backups occur in the evening at ~ 22:00 and, well, we get a lot of Junk/Spam.
It's always one email account.
Maybe the sqlite timeout should be increased. What is the timeout?
https://sqlite.org/c3ref/busy_timeout.html
Is the code using sqlite wal mode? https://www.sqlite.org/wal.html
Is the sqlite code in dovecot or fts-xapian?
All sqlite calls are in fts-xapian. I added a "timeout" function. Please knidly check again
I upgraded now. Also, it's happening to another email account too.
One yesterday. Two today.
I increased the timeout to see
I used the 100 timeout on 2020-10-16. I received one "database is locked" on 2020-10-17.
I changed 100 -> 500.
Please, @grosjo, for what is SQLite used? Cannot this be stored directly into xapian DB?
AFAIK, the timeout is in msec. Default timeout in Python & Exim is 5000 msec, Dovecot uses 1000 msec. IMO 100 msec is too low.
I used SQlite to handle expunges outside xapuan, as xapian does not explicitely handle concurrency.
https://getting-started-with-xapian.readthedocs.io/en/latest/concepts/concurrency.html
Does 5000 timeout solve the issue ? Do you have a better idea ?
Actually tried with https://tartarus.org/james/xapian-apidoc/html/namespaceXapian.html#a296fe000d6d9525bb8e80f72838026a6 from recent version of Xapian
Make sense ?
I cannot comment xapian locking, but if i properly understand provided link, there will be no timeout, but infinite waiting for lock, i am not sure if it is good idea.
I can comment SQLite locking, but i have only basic C/C++ experiences, thus i didn't inspect your code (i will lost in int), thus some of my notes can be applied already (you simple ignore them):
Using WAL mode can improve performance, especially it will allow concurrent write/read access. Its setting is persisent and can be enabled/disabled anytime, even by CLI tool. It can little slowdown reads, but concurrent access can improve it at all. Because it is persistent, it can be enabled on DB file creation (only once) and because it is not error to enable it more time, it can be enabled ag. in time plugin initialization, to ensure enabling it for existing DB file .
Please kindly test again, and let me know.
Please kindly test again, and let me know.
I will. I am currently testing 3f7fe699dddd04aad40e26f9512eb9094341a550 with sqlite3_busy_timeout(500).
After 24 h I will test b8cad27 (latest).
An sqlite3_busy_timeout of 500 works for me.
Now upgrade with the latest.
I am getting occasional errors in dovecot:
fts-xapian: 68ea96 (master) OS: Debian Linux testing/bullseye Dovecot: v2.3.11.3 (latest)