openzfsonosx / zfs

OpenZFS on OS X
https://openzfsonosx.org/
Other
824 stars 72 forks source link

bdb53 vs zfs issue #512

Open rottegift opened 8 years ago

rottegift commented 8 years ago

In part a placeholder for me, because this is a weird one. It took a long time to trace it down to zfs (and I was in denial about the choice of underlying filesystem being the difference for a couple of hours).

It's reproducible but takes some work.

  1. install latest amavisd-new, perl 5.22, and the various cpan dependencies.

(Alternatively, fastest way so far, reproducible with HEAD in a pretty clean VM: install macports, port install amavisd-new. Haven't checked with pkgsrc, but it has amavisd-new and it is probably even easier to install with that than with macports. Homebrew does not have amavisd-new.)

  1. Do the minimal configuration (you can omit spamassassin and clamav configuration, for example).
  2. Amavis creates db.001 db.002 __db.003 nanny.db snmp.db in the configured $db_home or in the directory specified by -D on the command line (overriding the configuration file).
  3. amavisd debug -D

If is HFS+ no problem, you'll see a bunch of debugging information and you can telnet to the configured port (default 10024) on localhost and see the 220 ESMTP header from the amavis server.

If is zfs, however (at least with case-insensitivity and utf-8; hfs mimic does not matter; quota does not matter) then amavisd logs "Creating db in /opt/local/var/amavis/db/; BerkeleyDB 0.55, libdb 5.3" (in this case /opt is zfs) and hangs there, and telnet connects but never gives you the ESMTP header. A whole core of CPU is consumed, and sample shows that it's spending time in the guts of libdb.

The Apple Server.app version does not fail this way on zfs, but that's using ancient bdb 4.2.

It's this block of perl code somewhere after the "Creating db in" do_log, which could use some extra debugging to see if I can nail down where things are going wrong.

  $! = 0; my $env = BerkeleyDB::Env->new(-Home=>$db_home, -Mode=>0640,
    -Flags=> DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL);
  defined $env
    or die "BDB can't create db env. at $db_home: $BerkeleyDB::Error, $!.";
  do_log(0, 'Creating db in %s/; BerkeleyDB %s, libdb %s',
            $name, BerkeleyDB->VERSION, $BerkeleyDB::db_version);
  $! = 0; my $dbs = BerkeleyDB::Hash->new(
    -Filename=>'snmp.db', -Flags=>DB_CREATE, -Env=>$env );
  defined $dbs or die "db_init: BDB no dbS: $BerkeleyDB::Error, $!.";
  $! = 0; my $dbn = BerkeleyDB::Hash->new(
    -Filename=>'nanny.db', -Flags=>DB_CREATE, -Env=>$env );
  defined $dbn or die "db_init: BDB no dbN: $BerkeleyDB::Error, $!.";

  Amavis::DB::SNMP::put_initial_snmp_data($dbs)  if $predelete_snmp;
  for my $db ($dbs, $dbn) {
    $db->db_close==0 or die "db_init: BDB db_close: $BerkeleyDB::Error, $!.";
  }

I did a quick attack with fs_usage, nothing useful so far. I did a quick attack with dtrace -p and it slowed the system way down (!), dunno why, but gave up. I didn't save the sample/spindump bits but will probably attach them when I come back to this later, since it's trivially easy to reproduce (amavisd stop; amavisd start -D ).

Haven't checked non-macports db53 yet. That's on the TODO list.

I do have several db53-using apps running without problem, and this stall does happen on the spamassassin bayes_toks bayes_seen db53 databases even when they're on zfs...

Vanav commented 8 years ago

I think I have this issue too. FreeBSD 11, ZFS, db5-5.3.28_4, p5-BerkeleyDB-0.55_1.

# service amavisd reload
Oct 19 11:31:44 amavis[58225]: Creating db in /var/amavis/db/; BerkeleyDB 0.55, libdb 5.3
Oct 19 11:32:39 amavis[58784]: (58784-06) _WARN: [58784] BDB S+N DESTROY INFO (db_nanny): db_close: BDB0061 PANIC: Invalid argument, No such process. at (eval 95) line 49.
Oct 19 11:32:40 amavis[58225]: (!!)TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
Oct 19 11:32:40 amavis[58225]: sd_notify (no socket): STOPPING=1\nSTATUS=TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
Oct 19 11:32:40 amavis[58225]: (!)_DIE: Suicide () TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
Oct 19 03:15:01 amavis[34738]: (34738-16) _WARN: [34738] BDB S+N DESTROY INFO (db_nanny): db_close: BDB0060 PANIC: fatal region error detected; run recovery, . at (eval 95) line 49.
Oct 19 03:15:01 amavis[35274]: (35274-02) _WARN: [35274] BDB S+N DESTROY INFO (db_snmp): db_close: BDB0061 PANIC: Invalid argument, No such process. at (eval 95) line 49.
Oct 19 03:15:01 amavis[35274]: (35274-02) _WARN: [35274] BDB S+N DESTROY INFO (db_nanny): db_close: BDB0060 PANIC: fatal region error detected; run recovery, . at (eval 95) line 49.
Oct 19 03:15:01 amavis[70417]: (!!)TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
Oct 19 03:15:01 amavis[70417]: sd_notify (no socket): STOPPING=1\nSTATUS=TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.
Oct 19 03:15:01 amavis[70417]: (!)_DIE: Suicide () TROUBLE in pre_loop_hook: BDB can't create db env. at /var/amavis/db: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery, File exists. at (eval 95) line 314.