worker's performance loss in v2.004.009

p-alik commented 7 years ago

On 03 November 2017 at 01:33 @esabol notifications@github.com wrote:

Well, I have tested the upstream branch, and I have good news and bad news.

The good news is that it solves the problem with the CPU usage. 10 seconds after starting the workers, ps auxgww shows 0.0% CPU usage.

The bad news is that the workers' response times are at least 25 times slower in my benchmarks. :( This is in comparison to version 2.004.008. (2.004.009 has even worse problems.)

I am using a simple SSL worker that just responds with "pong" when it receives the task "ping". Doing 100 iterations of running "ping" task using the identical client code:
Benchmark: timing 100 iterations of 47300, 47301...
     47300:  1 wallclock secs ( 0.14 usr +  0.02 sys =  0.16 CPU) @ 625.00/s (n=100)
            (warning: too few iterations for a reliable count)
     47301: 30 wallclock secs ( 0.17 usr +  0.02 sys =  0.19 CPU) @ 526.32/s (n=100)
            (warning: too few iterations for a reliable count)
Done.
Port 47300 is running 2.004.008 SSL. Port running 47301 is github/upstream SSL.

If I revert port 47301 to 2.004.008, and re-run my benchmarks:
Benchmark: timing 100 iterations of 47300, 47301...
     47300:  0 wallclock secs ( 0.06 usr +  0.01 sys =  0.07 CPU) @ 1428.57/s (n=100)
            (warning: too few iterations for a reliable count)
     47301:  0 wallclock secs ( 0.05 usr +  0.01 sys =  0.06 CPU) @ 1666.67/s (n=100)
            (warning: too few iterations for a reliable count)
Done.
I would call this an unacceptable performance regression.

p-alik commented 6 years ago

@esabol, f6c26ca8 solves the issue in my test environment. Could you test, please.

esabol commented 6 years ago

Sadly, it does not for me. It won't even complete 100 iterations. At some point a worker just gets frozen and doesn't respond at all.

esabol commented 6 years ago

Also, even when a worker doesn't freeze up, it's still slower than v2.004.008. Response times of upstream branch code range from 0.27083 to 3.527426 seconds, with most over 1.0 seconds. With v2.004.008, response times range from 0.00843 to 0.015321 seconds in a small sample of 10 consecutive iterations of each.

upstream: Response time: 2.148328 s Response time: 0.60157 s Response time: 0.27083 s Response time: 1.282266 s Response time: 0.32943 s Response time: 0.715273 s Response time: 0.296254 s Response time: 3.527426 s Response time: 0.842179 s Response time: 2.35561 s

v2.004.008: Response time: 0.012238 s Response time: 0.00843 s Response time: 0.00872 s Response time: 0.009641 s Response time: 0.013076 s Response time: 0.011655 s Response time: 0.015725 s Response time: 0.011963 s Response time: 0.006766 s Response time: 0.015321 s Response time: 0.012441 s

Response time is fairly critical for our usage in a web service.

What problem does v2.004.008 have that v2.004.009/upstream is trying to solve?

esabol commented 6 years ago

I upgraded to the latest versions of Net:SSLeay, IO::Socket::SSL, and IO::Socket::IP just in case I was encountering some kind of bug with one of those, and I get similar results.

p-alik commented 6 years ago

What problem does v2.004.008 have that v2.004.009/upstream is trying to solve?

Register on reconnect. See #23

p-alik commented 6 years ago

For benchmark purpose I'm using scripts similar to your description without SSL because neither 2.004.008 no upstream have some changes with regard to SSL.

client.pl


use v5.10;
use warnings;
use strict;
use Benchmark qw(:hireswallclock);
use Gearman::Client;

my $f = "foo"; my $c_248 = Gearman::Client->new(job_servers => "localhost:4730"); my $c_upstream = Gearman::Client->new(job_servers => "localhost:4731");

my $r = timethese( -1, { '2.4.8' => sub { $c_248->do_task($f) }, 'upstream' => sub { $c_upstream->do_task($f) }, } );

Benchmark::cmpthese($r);

- **worker.pl**
```perl
use v5.10;
use strict;
use warnings;

use Gearman::Worker;
use Time::HiRes qw(usleep);
# replace job_servers by "localhost:4730" for 2.004.008 worker
my $w = Gearman::Worker->new(job_servers => "localhost:4731");

$w->register_function(
    "foo",
    sub {
        # milisecond
        usleep 1_000;
        return 1;
    },
);

$w->work();

In my environment upstream is always slightly faster.

Benchmark: running 2.4.8, upstream for at least 1 CPU seconds...
     2.4.8: 8.39074 wallclock secs ( 1.03 usr +  0.10 sys =  1.13 CPU) @ 2642.48/s (n=2986)
  upstream: 5.18952 wallclock secs ( 0.99 usr +  0.16 sys =  1.15 CPU) @ 2596.52/s (n=2986)
           Rate upstream    2.4.8
upstream 2597/s       --      -2%
2.4.8    2642/s       2%       --

esabol commented 6 years ago

I'm not sure I understand. What do you mean "upstream always hits 2.4.8"?

Your timings seem to agree with what I'm seeing, but the disparity is not as bad without SSL.

I recommend changing "use Benchmark..." to use Benchmark qw(:hireswallclock); to get hires timings. A little trick I just discovered recently.

p-alik commented 6 years ago

I'm not sure I understand. What do you mean "upstream always hits 2.4.8"?

I meant upstream is faster.

I recommend changing "use Benchmark..." to use Benchmark qw(:hireswallclock); to get hires timings.

I update the client code in my former comment respectively to your suggestion.

esabol commented 6 years ago

I meant upstream is faster.

Are you sure? That's not what your original timings showed, I thought. IIRC, it showed 2 seconds for 2.4.8 and 7 seconds for upstream.

p-alik commented 6 years ago

In Benchmark output upstream is 2% faster.

In upstream I tried to reduce the complexity of initial work sub and to replacing usage of fd's by IO::Select. That's the result.

The only reason for the issue was replacing of default select timeout by 1. f6c26ca8 restores the old until 2.004.008 used timeout value and solves the issue.

esabol commented 6 years ago

That doesn't explain your previous timings, nor does it explain why my timings are vastly worse with upstream+SSL. Anyway, I will try to test without SSL and see if I can reproduce your latest timings.

esabol commented 6 years ago

Also, you might want to redo your timings with Benchmark::Forking instead of Benchmark for more accurate benchmarks. Leave the use Benchmark qw(:hireswallclock); line and put use Benchmark::Forking; on the next line. Maybe it won't make any difference. I don't know. I've been using Benchmark::Forking for what it's worth.

http://www.dagolden.com/index.php/1849/adventures-in-benchmarking-part-1/

p-alik commented 6 years ago

new client:

use v5.10;
use warnings;
use strict;
use Benchmark qw(:hireswallclock);
use Benchmark::Forking qw(cmpthese);
use Gearman::Client;

my ($f, $count) = qw/foo 1000/;
my $c_248      = Gearman::Client->new(job_servers => "jobsched-b0-01.mip-platform.net:4730");
my $c_upstream = Gearman::Client->new(job_servers => "jobsched-b1-00.mip-platform.net:4730");

my $r = cmpthese (
    -1,
    {
        '2.4.8'    => sub { $c_248->do_task($f) },
        'upstream' => sub { $c_upstream->do_task($f) },
    }
);

No significant alteration in benchmark result.

p-alik commented 6 years ago

@esabol with 92c07a7 worker implementation equal to 2.004.008. Only bug fixing of #23 is a part of the next release 2.004.010 I'll put to cpan next days. In my tests new release doesn't have any disadvantage. For testing purpose you coud use the code of master branch.

esabol commented 6 years ago

I still see this a major problem. Can you test with SSL? I haven't been able to find the time to test without SSL, unfortunately.

p-alik commented 6 years ago

Unfortunately SSL tests doesn't work with PR #31. So v2.004.010 was uploaded to CPAN without PR #31

p-alik / perl-Gearman

worker's performance loss in v2.004.009 #29