grosjo / fts-xapian

Dovecot FTS plugin based on Xapian
GNU Lesser General Public License v2.1
91 stars 19 forks source link

1.5.9: Can't re-create Xapian DB; DatabaseCreateError - Permission denied #147

Closed Philippe23 closed 5 months ago

Philippe23 commented 6 months ago

I upgraded from 1.5.7 to 1.5.9 yesterday.

I'm now seeing errors like the following in my logs:

Feb 5 08:21:16 myhostname dovecot: indexer-worker(30123): Error: FTS Xapian: Can't re-create Xapian DB /home/USER1/.maildir/xapian-indexes/db_559fd834187a795b511b0000d09efc50 : DatabaseCreateError - Permission denied

Permissions on that folder are 0700 and 0600 on it's contents. All are owned by USER1. Many of the files in there have mtime's from today.

I do run /usr/bin/doveadm fts optimize -F ${LIST_OF_USERS_FILE} every night, but the timestamps suggest this isn't happening then (unless it takes 8-hours).

Unsure if this is a new issue or just a new output.

But mostly, unsure of what permissions/account is being used when the failure happens.

grosjo commented 6 months ago

Can you share "doveconf -n" ?

Philippe23 commented 6 months ago
# 2.3.20 (80a5ac675d): /etc/dovecot/dovecot.conf
# Pigeonhole version 0.5.20 (149edcf2)
# OS: Linux 6.6.13-gentoo-x86_64 x86_64 Gentoo Base System release 2.14
# Hostname: localhost
first_valid_uid = 1000
mail_location = maildir:~/.maildir
mail_plugins = fts fts_xapian
managesieve_notify_capability = [...]
managesieve_sieve_capability = [...]
namespace inbox {
  inbox = yes
  location =
  mailbox Drafts {
    special_use = \Drafts
  }
  mailbox Junk {
    special_use = \Junk
  }
  mailbox Sent {
    special_use = \Sent
  }
  mailbox "Sent Messages" {
    special_use = \Sent
  }
  mailbox Trash {
    special_use = \Trash
  }
  prefix = INBOX.
  separator = .
}
passdb [...]
plugin {
  fts = xapian
  fts_autoindex = yes
  fts_decoder = decode2text
  fts_enforced = yes
  fts_xapian = partial=3 full=20 verbose=0
  sieve = [...]
  sieve_default = [...]
  sieve_default_name = [...]
  sieve_extensions = [...]
  sieve_spamtest_max_header = [...]
  sieve_spamtest_status_header = [...]
  sieve_spamtest_status_type = [...]
}
pop3_uidl_format = [...]
protocols = imap pop3 lmtp sieve
service decode2text {
  executable = script /usr/libexec/dovecot/decode2text.sh
  unix_listener decode2text {
    mode = 0666
  }
  user = dovecot
}
service indexer-worker {
  vsz_limit = 1 G
}
service lmtp {
  unix_listener /var/spool/postfix/private/lmtp-dovecot {
    group = postfix
    mode = 0600
    user = postfix
  }
}
service managesieve-login [...]
ssl = required
ssl_cert = [...]
ssl_cipher_list = [...]
ssl_dh = # hidden, use -P to show it
ssl_key = # hidden, use -P to show it
userdb [...]
protocol lmtp {
  auth_username_format = [...]
  mail_plugins = fts fts_xapian sieve
  postmaster_address = [...]
}
protocol lda {
  mail_plugins = sieve
}
protocol imap {
  mail_max_userip_connections = [...]
}

With some stuff that I assume is unrelated replaced by "[...]". If I censored anything that you need, let me know.

grosjo commented 6 months ago

Can you try with latest git ?

Philippe23 commented 6 months ago

Updated. I'll report back what shows up.

Philippe23 commented 6 months ago

Feb 9 20:32:14 myhostname dovecot: indexer-worker(USER1)<11644><eXZcIx7SxmV4LQAA0J78UA:2dI+Jh7SxmV8LQAA0J78UA>: Error: FTS Xapian: Can not open RO index (INBOX) /home/USER1/.maildir/xapian-indexes/db_f4b25a30bb7c795bd9220000d09efc50 : DatabaseOpeningError - Permission denied

Feb 9 20:32:14 myhostname dovecot: indexer-worker(USER1)<11644><eXZcIx7SxmV4LQAA0J78UA:2dI+Jh7SxmV8LQAA0J78UA>: Error: FTS Xapian: Can't open Xapian DB (RW) (INBOX) /home/USER1/.maildir/xapian-indexes/db_f4b25a30bb7c795bd9220000d09efc50 : DatabaseOpeningError - Permission denied

Here's a listing of that folder's contents:

total 87712
drwx------   2 USER1 USER1     4096 Feb  6 18:29 .
drwx------ 610 USER1 USER1    86016 Feb  9 19:55 ..
-rw-------   1 USER1 USER1     8192 Dec  4  2021 docdata.glass
-rw-------   1 USER1 USER1        0 Feb  9 20:32 flintlock
-rw-------   1 root  USER1      149 Feb  6 18:29 iamglass
-rw-------   1 USER1 USER1     8192 Dec  4  2021 position.glass
-rw-------   1 USER1 USER1 78856192 Feb  6 18:29 postlist.glass
-rw-------   1 USER1 USER1     8192 Dec  4  2021 spelling.glass
-rw-------   1 USER1 USER1     8192 Dec  4  2021 synonym.glass
-rw-------   1 USER1 USER1 10821632 Feb  6 18:29 termlist.glass

Not sure if that one root owned file has anything to do with it or offers any insight.

USER1 does have 2 files under ~USER1/.maildir/xapian-indexes/ that are not owned by USER1, both with that iamglass filename. I should also state that USER1 has 608 iamglass files, so presumably that means 606 owned by themselves.

Unsure if this one is related or not:

Feb 9 20:32:14 myhostname dovecot: indexer-worker(USER1)<11644><eXZcIx7SxmV4LQAA0J78UA:2dI+Jh7SxmV8LQAA0J78UA>: Error: Mailbox INBOX: Precache for UID=257363 failed: Internal error occurred. Refer to server log for more information. [2024-02-09 20:32:14] (no mails indexed)

Philippe23 commented 6 months ago

Oh, and I just looked at what you changed. This didn't trigger an error (and it's my user, not USER1), but figured I'd send it in case it's helpful.

Feb 9 21:42:19 myhostname dovecot: indexer-worker(MYUSER)<22385>: FTS Xapian (Threaded MYUSER): Commit & Closing (INBOX) starting (/home/MYUSER/.maildir/xapian-indexes/db_597c200f157a795b4a1b0000d09efc50) : /usr/libexec/dovecot/indexer-worker

grosjo commented 6 months ago

I see. Change manually the ownership of the 2 iamglass files and let's try with latest git

Philippe23 commented 6 months ago

Just did find /home/USER1/.maildir/xapian-indexes/ -not -user USER1 -exec chown USER1 {} + (and again for a few more users that also had root permissions on that same file).

After that I upgrade to latest and restarted.

I'll report back in a few hours.

Philippe23 commented 6 months ago

Feb 10 12:07:21 myhostname dovecot: indexer-worker(31108): FTS Xapian: Commit & Closing (Trash,/home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50) - Threaded from=MYUSER to=MYUSER : Releasing Xapian db Feb 10 12:07:21 myhostname dovecot: indexer-worker(31108): FTS Xapian: Commit & Closing (Trash,/home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50) - Threaded from=MYUSER to=MYUSER : Re-creating index database due to error Feb 10 12:07:21 myhostname dovecot: indexer-worker(31108): Error: FTS Xapian: Can't re-create Xapian DB /home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50 : DatabaseCreateError - Permission denied

Looked back for the previous ref to that db_e0e0872a837b795b6d200000d09efc50, these are from a few seconds earlier.

Feb 10 12:07:12 myhostname dovecot: indexer-worker(MYUSER)<31108>: FTS Xapian: Commit & Closing (Trash,/home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50) - Threaded from=MYUSER to=MYUSER : starting (/usr/libexec/dovecot/indexer-worker) : (null) Feb 10 12:07:12 myhostname dovecot: indexer-worker(MYUSER)<31108>: FTS Xapian: Commit & Closing (Trash,/home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50) - Threaded from=MYUSER to=MYUSER : Writing 7743 (old) vs 7745 (new)

So I looked past the error for the next reference, a few minutes later:

Feb 10 12:09:14 myhostname dovecot: indexer-worker(MYUSER)<31108>: FTS Xapian: Opening DB (RW) /home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50 Feb 10 12:09:14 myhostname dovecot: indexer-worker(MYUSER)<31108>: FTS Xapian: Opening DB (RW) /home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50 (7743 docs stored): Done

... skip over the log entries where it's doing the work ...

Feb 10 12:09:14 myhostname dovecot: indexer-worker(MYUSER)<31108>: FTS Xapian: Commit & Closing (Trash,/home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50) - Threaded from=MYUSER to=MYUSER : starting (/usr/libexec/dovecot/indexer-worker) : (null) Feb 10 12:09:14 myhostname dovecot: indexer-worker(MYUSER)<31108>: FTS Xapian: Commit & Closing (Trash,/home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50) - Threaded from=MYUSER to=MYUSER : Writing 7743 (old) vs 7746 (new) Feb 10 12:09:14 myhostname dovecot: indexer-worker(MYUSER)<31108>: FTS Xapian: Deinit /home/MYUSER/.maildir/xapian-indexes) Feb 10 12:09:19 myhostname dovecot: indexer-worker(31108): FTS Xapian: Commit & Closing (Trash,/home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50) - Threaded from=MYUSER to=MYUSER : Releasing Xapian db Feb 10 12:09:19 myhostname dovecot: indexer-worker(31108): FTS Xapian: Commit & Closing (Trash,/home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50) - Threaded from=MYUSER to=MYUSER : Done in 4184 ms

Not sure why it's permission denied, but then a minute later it can do it just fine.

Here's the files in there right now -- not sure it's helpful.

total 1620224
drwx------  2 MYUSER MYUSER       4096 Feb 10 12:09 .
drwx------ 35 MYUSER MYUSER       4096 Feb 10 12:09 ..
-rw-------  1 MYUSER MYUSER       8192 Dec  4  2021 docdata.glass
-rw-------  1 MYUSER MYUSER          0 Feb 10 12:09 flintlock
-rw-------  1 MYUSER MYUSER        159 Feb 10 12:09 iamglass
-rw-------  1 MYUSER MYUSER       8192 Dec  4  2021 position.glass
-rw-------  1 MYUSER MYUSER 1498611712 Feb 10 12:09 postlist.glass
-rw-------  1 MYUSER MYUSER       8192 Dec  4  2021 spelling.glass
-rw-------  1 MYUSER MYUSER       8192 Dec  4  2021 synonym.glass
-rw-------  1 MYUSER MYUSER  160391168 Feb 10 12:09 termlist.glass

Let me know if there's something specific I should search for that'd be helpful.

grosjo commented 6 months ago

Please try with latest git

grosjo commented 6 months ago

Before "Feb 10 12:07:21 myhostname dovecot: indexer-worker(31108): Error: FTS Xapian: Can't re-create Xapian DB /home/MYUSER/.maildir/xapian-indexes/db_e0e0872a837b795b6d200000d09efc50 : DatabaseCreateError - Permission denied" , there shall be an error message Can you share it with me ?

Philippe23 commented 6 months ago

Feb 10 12:07:21 myhostname dovecot: indexer-worker(31108): Error: FTS Xapian: DatabaseError - (null)

Philippe23 commented 6 months ago

Any chance this is being triggered because some state is being left over from a previous user being processed?

The two I've looked at, the DatabaseError - (null) happens at the threshold when (what I assume is the PID) finishes work for one user, and starts the work of another.

Feb 11 07:07:09 myhostname dovecot: lmtp(USER1)<20624>: sieve: msgid=2770f7ee-5760-4b30-af81-315a28ea3dce@atl1s07mta3184.xt.local: fileinto action: stored mail into mailbox 'INBOX.Low-Priority.Future-Spam' Feb 11 07:07:09 myhostname dovecot: indexer-worker(USER1)<20628>: FTS Xapian: Done indexing 'Low-Priority.Future-Spam' (d6b8fc1b0b7d795b07230000d09efc50) (1 msgs in 130 ms, rate: 7.7) Feb 11 07:07:29 myhostname dovecot: indexer-worker(20628): Error: FTS Xapian: DatabaseError - (null) Feb 11 07:07:29 myhostname dovecot: indexer-worker(20628): Error: FTS Xapian: Can't re-create Xapian DB /home/USER2/.maildir/xapian-indexes/db_559fd834187a795b511b0000d09efc50 : DatabaseCreateError - Permission denied

grosjo commented 6 months ago

Hi , if you put verbose=1, there shall be other messages around the error

Can you try again with latest git, and verbose=1 in dovecot.conf ?

Philippe23 commented 6 months ago

Side note: warning in latest:

fts-backend-xapian-functions.cpp:727:36: warning: format '%s' expects a matching 'char*' argument [-Wformat=]
  727 |                 if(err) { i_info("%s : Could not commit this time, but will do a bit later"); }
      |                                   ~^
      |                                    |
      |                                    char*
Philippe23 commented 6 months ago

Made a PR in case it makes your life easier. (I started seeing an index-worker crash once I saw that warning.)

Also feel free to reject it since it's such a simple fix.

PR: https://github.com/grosjo/fts-xapian/pull/148

grosjo commented 6 months ago

Thank you for PR, but this shall not be he source of crash (as the error is triggered before)

Can you share the logs around the crash with verbose=1 ?

Philippe23 commented 6 months ago

crash_logs.txt

grosjo commented 6 months ago

Is that latest git running on 2.3.21 ?

Philippe23 commented 6 months ago

It was latest as of 1:14pm EST today + the change from that PR, which I think equates to latest as of now.

I'm running Dovecot 2.3.20.

grosjo commented 6 months ago

Kindly use latest git (adjusted the debug messages to understand the root cause) and add the following to your dovecot .conf

service indexer-worker { process_limit = 0
}

Philippe23 commented 6 months ago

I see an instance where iamglass went back to owned by root -- this is one I know I fixed over the weekend.

2024-02Feb-12-13.41_error.log

Haven't seen the crash yet.

grosjo commented 6 months ago

I think I understood the root cause of the crash (i_info calling discarded data from stack)

For the permission change, I am still searching for the root cause

Philippe23 commented 6 months ago

I haven't seen the Permission error yet, but did get the DatabaseError - (null) yesterday.

2024-02Feb-12-14.26_DatabaseError.log

Any chance the process_limit = 0 is keeping the Permission error at bay?

grosjo commented 6 months ago

Are you using hte latest git ?

Philippe23 commented 6 months ago

It's a moving target. I'm on 45488c3856b33189b4545ca7006421f2ea33d703. That was the latest last time I rebuilt.

Philippe23 commented 6 months ago

Just updated to 7f65cf2eb735181bcabfb79c95d14967d3a59cd7.

Warning in that version:

In file included from fts-backend-xapian.cpp:73:
fts-backend-xapian-functions.cpp: In function 'void fts_backend_xapian_ownershi (std::string*)':
fts-backend-xapian-functions.cpp:675:82: warning: format '%ld' expects argument of type 'long int', but argument 3 has type 'const char*' [-Wformat=]
  675 | _settings.verbose>0) syslog(LOG_INFO,"Fixing ownership to %ld:%ld",dbpath->c_str(),(long)(info.st_uid),(long)(info.st_gid));
      |                                                           ~~^      ~~~~~~~~~~~~~~~
      |                                                             |                   |
      |                                                             long int            const char*
      |                                                           %s

I think you're missing a %s you meant to put in the string before the UID's %ld.

grosjo commented 6 months ago

Yep. Can you install 1.6.2 and let me know of any further crash or permission issue ?

Philippe23 commented 6 months ago

Will do! Thanks!

grosjo commented 6 months ago

@Philippe23 are we good ?

Philippe23 commented 6 months ago

Feb 16 17:28:21 myhostname dovecot: indexer-worker(USER1)<1645><0WkLYXkRt8ZKioJO:sIosIIXhz2VtBgAA0J78UA>: Error: FTS Xapian: Can not open RO index (ABC.DEF.GHI) /home/USER1/.maildir/xapian-indexes/db_c06be42691587964f7260000d09efc50 : DatabaseOpeningError - Permission denied

On v1.6.2.

From my nightly cron /usr/bin/doveadm fts optimize -F ${USERLISTFILE} a few hours later:

doveadm(USER1): Error: FTS Xapian: Optimize (6) /home/USER1/.maildir/xapian-indexes/db_c06be42691587964f7260000d09efc50/iamglass: Failed to open glass revision file for reading

Perms are back to root:

-rw-------   1 root  USER1  141 Feb 16 17:28 iamglass

But, I had dropped back down to verbose=0, so I don't have a lot more useful info.

Here's a few nearby log entries:

Feb 16 17:27:36 myhostname dovecot: indexer-worker(USER1)<1645><0WkLYXkRt8ZKioJO:SOWZBlbhz2VtBgAA0J78UA>: FTS Xapian: Done indexing 'ABC.DEF.GHI' (c06be42691587964f7260000d09efc50) (3 msgs in 1940 ms, rate: 1.5) Feb 16 17:28:06 myhostname dovecot: indexer-worker(USER1)<1645><0WkLYXkRt8ZKioJO:YNq0Gnbhz2VtBgAA0J78UA>: FTS Xapian: Done indexing 'ABC.DEF.GHI' (c06be42691587964f7260000d09efc50) (1 msgs in 157 ms, rate: 6.4) Feb 16 17:28:06 myhostname FTS Xapian: Commit & Closing (ABC.DEF.GHI,/home/USER1/.maildir/xapian-indexes/db_c06be42691587964f7260000d09efc50) - Threaded from=USER1 : [1645]: Can not chown /home/USER1/.maildir/xapian-indexes/db_c06be42691587964f7260000d09efc50/iamglass Feb 16 17:28:21 myhostname dovecot: indexer-worker(USER1)<1645><0WkLYXkRt8ZKioJO:sIosIIXhz2VtBgAA0J78UA>: Error: FTS Xapian: Can not open RO index (ABC.DEF.GHI) /home/USER1/.maildir/xapian-indexes/db_c06be42691587964f7260000d09efc50 : DatabaseOpeningError - Permission denied

grosjo commented 6 months ago

Please adjust using chown directly and try latest git

grosjo commented 6 months ago

I think I understand betetr the change of ownership. This shall come from coredump or at least crash of the code without notice and the last file is held by root. So I tried to counter all possible crashes in the code. Kindly try latest git

Philippe23 commented 6 months ago

2024-02-22.log

$ ls -la ~USER1/.maildir/xapian-indexes/db_8db1fb2bb06cd765f2110000d09efc50/
drwx------   2 USER1 USER1   4096 Feb 22 18:34 .
drwx------ 611 USER1 USER1  86016 Feb 23 04:11 ..
-rw-------   1 USER1 USER1   8192 Feb 22 10:48 docdata.glass
-rw-------   1 USER1 USER1      0 Feb 23 04:11 flintlock
-rw-------   1 root  USER1    118 Feb 22 18:34 iamglass
-rw-------   1 USER1 USER1   8192 Feb 22 10:48 position.glass
-rw-------   1 USER1 USER1 794624 Feb 22 18:34 postlist.glass
-rw-------   1 USER1 USER1   8192 Feb 22 10:48 spelling.glass
-rw-------   1 USER1 USER1   8192 Feb 22 10:48 synonym.glass
-rw-------   1 USER1 USER1 163840 Feb 22 18:34 termlist.glass

I may be reading it wrong, but I think this is a new folder, so the log file is the whole lifetime of that 8db1fb2bb06cd765f2110000d09efc50 DB.

grosjo commented 6 months ago

Hi, This "root" owernship seems to be created when there is a crash (memory low or something similar) I tried to further ctach those in versin 1.7.0

Can you fix manually the eprsmissions and try 1.7.0 ?

grosjo commented 6 months ago

Still some "root" permission errors ?

Philippe23 commented 5 months ago

I haven't seen the permission error again, yet, but I do see this one, which perhaps is the crash you suspected:

Feb 26 21:29:22 myhostname dovecot: indexer-worker(MYUSER)<26433>: FTS Xapian: Sleep1 Feb 26 21:29:23 myhostname dovecot: indexer-worker(MYUSER)<26433>: FTS Xapian: Thread #55 (0) Active Feb 26 21:29:23 myhostname dovecot: indexer-worker(MYUSER)<26433>: FTS Xapian: Thread #123 (1) Active Feb 26 21:29:23 myhostname dovecot: indexer-worker(MYUSER)<26433>: FTS Xapian: Sleep1 Feb 26 21:29:23 myhostname dovecot: indexer-worker: Error: terminate called after throwing an instance of 'std::bad_alloc' Feb 26 21:29:23 myhostname dovecot: indexer-worker: Error: what(): std::bad_alloc Feb 26 21:29:24 myhostname dovecot: indexer-worker(MYUSER)<26433>: Fatal: master: service(indexer-worker): child 26433 killed with signal 6 (core dumps disabled - https://dovecot.org/bugreport.html#coredumps)

This is on 1.7.2.

grosjo commented 5 months ago

Hmmm, another crash. Can you use latest git, enable your coredump (based on https://www.dovecot.org/bugreport-mail/ instructions (paragraph coredumps) ) and let me know the output of gdb for the next crash ?

grosjo commented 5 months ago

Can you try 1.7.6, that should resolve all issues

grosjo commented 5 months ago

Are we good ?

Philippe23 commented 5 months ago

I have not had to fix folder permissions since Feb. 25th (which is before my previous comment).

I haven't seen that std::bad_alloc exception again either.

I upgraded to 1.7.6 on Mar. 5th.

I'll close this, and reopen it in the unlikely case it shows back up.

Thanks a bunch for all this work!