p-alik / perl-Gearman

9 stars 10 forks source link

Multiple gearman servers and redundancy #39

Closed vkatsikaros closed 6 years ago

vkatsikaros commented 6 years ago

Hi

The gearman documentation says on multiple servers:

You are able to run multiple job servers and have the clients and workers connect to the first available job server they are configured with. This way if one job server dies, clients and workers automatically fail over to another job server.

When we experimented with one of many gearman servers to be stopped we experienced broadcasting to be 100 to 200 times slower, compared to all gearman server running.

I would like to ask if this is the expected behavior and/or if there is something additional we could do to improve the situation:

Results

client.pl

use Benchmark;
use Gearman::Client;
use Data::UUID;
use Storable qw( freeze );
$i = 1;
my $uuid_gen = Data::UUID->new;
timethis (500, \&gearman_client);

sub gearman_client {
  my $client = Gearman::Client->new;
  $client->job_servers(
    { host => '10.30.5.41', port => 4900 },
    { host => '10.30.6.194', port => 4900 },
    { host => '10.30.6.108', port => 4900 }, # comment out for 2 gearman servers setup
  );
  my $serialized = freeze([ 3, 5 ]);
  my $task = Gearman::Task->new(
    'sum',
    \$serialized,
    {
      uniq => $uuid_gen->create_str,
    }
  );
  $client->dispatch_background($task);
  print "Sum of tasks sent $i\n";
  $i++
}

worker.pl

use Gearman::Worker;
use Storable qw( thaw );
use List::Util qw( sum );
$i = 1;
my $worker = Gearman::Worker->new;
$worker->job_servers(
  { host => '10.30.5.41', port => 4900 },
  { host => '10.30.6.194', port => 4900 },
  { host => '10.30.6.108', port => 4900 }, # comment out for 2 gearman servers setup
);
$worker->register_function(
  'sum',
  sub {
    sum @{ thaw($_[0]->arg) };
    print "Sum of tasks received $i\n";
    $i++;
  }
);
$worker->work while 1;

Setup

Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES PERLIO_LAYERS PERL_COPY_ON_WRITE PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP PERL_OP_PARENT PERL_PRESERVE_IVUV USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_PERLIO USE_PERL_ATOF Locally applied patches: Devel::PatchPerl 1.48 Built under linux Compiled at May 4 2018 17:12:37 %ENV: PERL5LIB="/a_directory" @INC: /a_directory /opt/perlbrew/perls/perl-5.26.2/lib/site_perl/5.26.2/x86_64-linux /opt/perlbrew/perls/perl-5.26.2/lib/site_perl/5.26.2 /opt/perlbrew/perls/perl-5.26.2/lib/5.26.2/x86_64-linux /opt/perlbrew/perls/perl-5.26.2/lib/5.26.2

p-alik commented 6 years ago

I guess default command_timeout is blame for the issue.

Worker's register_function also supports setting of timeout.

vkatsikaros commented 6 years ago

Thanks for the comments @p-alik

I did experiments with the default timeout (30) and a timeout of 1. However, I noticed that there is a difference (regardless of timeout) depending on the meaning of "server running":

  1. the machine hosting gearmand is turned on and accepts network connections but the gearmand server is not running
  2. the machine hosting gearmand is turned off and the IP is not pointing to another machine

It turns out my previous issue comment was about the 2nd case. With a timeout of both 1 and 30 I see the same unexpected behavior: the client becoming 100-200 times slower

Regarding the 1st case, with a timeout of both 1 and 30 I see the same expected behavior: the client runs equally fast.

Any ideas?

Results

timeout 1:

timeout 30:

client.pl

use strict;
use warnings;
use Benchmark;
use Gearman::Client;
use Data::UUID;
use Storable qw( freeze );
my $i = 1;
my $timeout = 1;
my $uuid_gen = Data::UUID->new;
timethis (500, \&gearman_client);

sub gearman_client {
  my $client = Gearman::Client->new(command_timeout => $timeout);
  $client->job_servers(
    { host => '10.30.5.41', port => 4900 },
    { host => '10.30.6.104', port => 4900 },
  );
  my $serialized = freeze([ 3, 5 ]);
  my $task = Gearman::Task->new(
    'sum',
    \$serialized,
    {
      uniq => $uuid_gen->create_str,
    }
  );
  $client->dispatch_background($task);
  print "Sum of tasks sent $i\n";
  $i++;
}

worker.pl

use strict;
use warnings;
use Gearman::Worker;
use Storable qw( thaw );
use List::Util qw( sum );
my $i = 1;
my $timeout = 1;
my $worker = Gearman::Worker->new;
$worker->job_servers(
  { host => '10.30.5.41', port => 4900 },
  { host => '10.30.6.104', port => 4900 },
);
$worker->register_function(
  'sum',
  $timeout,
  sub {
    sum @{ thaw($_[0]->arg) };
    print "Sum of tasks received $i\n";
    $i++;
  }
);
$worker->work while 1;
p-alik commented 6 years ago

Regarding the 1st case, with a timeout of both 1 and 30 I see the same expected behavior: the client runs equally fast.

I'm sorry for misleading you.

Any ideas?

socket initialisation is much faster if resource is available.

client.pl used for testing

use strict;
use warnings;
use Benchmark;
use Gearman::Client;
use Data::Dump "dump";

my $timeout = 1;
my $client = Gearman::Client->new(command_timeout => $timeout);
$client->job_servers(
  { host => 'localhost', port => 4730 },
);

timethis (500, \&gearman_sock);

sub gearman_sock {
  $client->debug(1);
  foreach ($client->job_servers) {
    print dump($_), $/;
    $client->socket($_);
  }
}
vkatsikaros commented 6 years ago

@p-alik thanks for the comments. So I understand, there no a solution with "perl-Gearman" to improve the situation if the machine with gearmand is not running. I guess the only fallback option is to somehow to a "ping" the machines in my client code and dynamically set the $client->job_servers. Any thoughts about a "perl-Gearman" solution?

p-alik commented 6 years ago

Any thoughts about a "perl-Gearman" solution?

Unfortunately I haven't any solution for the issue. In my opinion it's an issue of application/environment and not of perl-Gearman. There is no issue at all if Gearman::{Client,Worker} is initialised with improved list of gearmand-instances.

But if you have any idea to improve perl-Gearman, please share it.

vkatsikaros commented 6 years ago

Thanks @p-alik !