Open rottegift opened 8 years ago
I think I have this issue too. FreeBSD 11, ZFS, db5-5.3.28_4, p5-BerkeleyDB-0.55_1.
# service amavisd reload
Oct 19 11:31:44 amavis[58225]: Creating db in /var/amavis/db/; BerkeleyDB 0.55, libdb 5.3
Oct 19 11:32:39 amavis[58784]: (58784-06) _WARN: [58784] BDB S+N DESTROY INFO (db_nanny): db_close: BDB0061 PANIC: Invalid argument, No such process. at (eval 95) line 49.
Oct 19 11:32:40 amavis[58225]: (!!)TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
Oct 19 11:32:40 amavis[58225]: sd_notify (no socket): STOPPING=1\nSTATUS=TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
Oct 19 11:32:40 amavis[58225]: (!)_DIE: Suicide () TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
Oct 19 03:15:01 amavis[34738]: (34738-16) _WARN: [34738] BDB S+N DESTROY INFO (db_nanny): db_close: BDB0060 PANIC: fatal region error detected; run recovery, . at (eval 95) line 49.
Oct 19 03:15:01 amavis[35274]: (35274-02) _WARN: [35274] BDB S+N DESTROY INFO (db_snmp): db_close: BDB0061 PANIC: Invalid argument, No such process. at (eval 95) line 49.
Oct 19 03:15:01 amavis[35274]: (35274-02) _WARN: [35274] BDB S+N DESTROY INFO (db_nanny): db_close: BDB0060 PANIC: fatal region error detected; run recovery, . at (eval 95) line 49.
Oct 19 03:15:01 amavis[70417]: (!!)TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
Oct 19 03:15:01 amavis[70417]: sd_notify (no socket): STOPPING=1\nSTATUS=TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
Oct 19 03:15:01 amavis[70417]: (!)_DIE: Suicide () TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
In part a placeholder for me, because this is a weird one. It took a long time to trace it down to zfs (and I was in denial about the choice of underlying filesystem being the difference for a couple of hours).
It's reproducible but takes some work.
(Alternatively, fastest way so far, reproducible with HEAD in a pretty clean VM: install macports, port install amavisd-new. Haven't checked with pkgsrc, but it has amavisd-new and it is probably even easier to install with that than with macports. Homebrew does not have amavisd-new.)
If is HFS+ no problem, you'll see a bunch of debugging information and you can telnet to the configured port (default 10024) on localhost and see the 220 ESMTP header from the amavis server.
If is zfs, however (at least with case-insensitivity and utf-8; hfs mimic does not matter; quota does not matter) then amavisd logs "Creating db in /opt/local/var/amavis/db/; BerkeleyDB 0.55, libdb 5.3" (in this case /opt is zfs) and hangs there, and telnet connects but never gives you the ESMTP header. A whole core of CPU is consumed, and sample shows that it's spending time in the guts of libdb.
The Apple Server.app version does not fail this way on zfs, but that's using ancient bdb 4.2.
It's this block of perl code somewhere after the "Creating db in" do_log, which could use some extra debugging to see if I can nail down where things are going wrong.
I did a quick attack with fs_usage, nothing useful so far. I did a quick attack with dtrace -p and it slowed the system way down (!), dunno why, but gave up. I didn't save the sample/spindump bits but will probably attach them when I come back to this later, since it's trivially easy to reproduce (amavisd stop; amavisd start -D).
Haven't checked non-macports db53 yet. That's on the TODO list.
I do have several db53-using apps running without problem, and this stall does happen on the spamassassin bayes_toks bayes_seen db53 databases even when they're on zfs...