psf / pyperf

Toolkit to run Python benchmarks
http://pyperf.readthedocs.io/
MIT License
810 stars 79 forks source link

How to specify setup code with Runner.bench_func() ? #38

Open parasyte opened 6 years ago

parasyte commented 6 years ago

I am trying to bench some data structures with a large number of items in each structure. I would like to have a single setup that is called once (and only once) for all benchmark runs, because the setup takes several seconds to complete.

While debugging, I discovered this very peculiar behavior that perf reloads the entire main module for each run.

import perf
import time

def run_benchmarks(module):
    runner = perf.Runner()

    def get_doc(attr):
        return getattr(module, attr).__doc__

    bench_names = sorted([ x for x in dir(module) if x.startswith('bench_') ], key=get_doc)
    for bench_name in bench_names:
        bench = getattr(module, bench_name)
        doc = bench.__doc__
        print('Running: %s' % (doc))
        runner.bench_func(doc, bench)

class BenchModule(object):
    def __init__(self):
        print('Creating BenchModule()')
        time.sleep(3)
        print('Created BenchModule()')

    def bench_foo(self):
        '''Benchmark: foo'''
        return list(range(1000))

    def bench_bar(self):
        '''Benchmark: bar'''
        return list(range(1000))

if __name__ == '__main__':
    print('Starting benchmarks')
    module = BenchModule()
    run_benchmarks(module)

When I run this, I see output like the following:

Starting benchmarks
Creating BenchModule()
Created BenchModule()
Running: Benchmark: bar
Starting benchmarks
Creating BenchModule()
Created BenchModule()
Running: Benchmark: bar
Running: Benchmark: foo
.Starting benchmarks
Creating BenchModule()
Created BenchModule()
Running: Benchmark: bar
Running: Benchmark: foo
.Starting benchmarks
Creating BenchModule()
Created BenchModule()
Running: Benchmark: bar
Running: Benchmark: foo
.Starting benchmarks
Creating BenchModule()

[ ... snip ... ]

.Starting benchmarks
Creating BenchModule()
Created BenchModule()
Running: Benchmark: bar
Running: Benchmark: foo
.
Benchmark: foo: Mean +- std dev: 9.83 us +- 0.16 us

... what the ? 😖

Why is it printing my Starting benchmarks line more than once? More importantly, how is it doing this? Is there some black magic going on with subprocess(__main__)?

But back to the original issue that lead me down this rathole; how do I prevent perf from running my BenchModule constructor on each run? I put a 3-second sleep in there to illustrate why the current behavior is obnoxious. In my real world benchmark, the setup time is more than 30 seconds, and running the complete test suite lasts a few hours.

This is possibly related: https://github.com/vstinner/perf/issues/28

vstinner commented 6 years ago

IMHO your benchmark works as you expect, it's just that child worker processese are flooding stdout :-)

See the documentation of the perf architecture: http://perf.readthedocs.io/en/latest/run_benchmark.html#perf-architecture

Many processes are spawned. Even if a worker process displays "Running: Benchmark: bar" and "Running: Benchmark: foo", in practice, a worker only runs a single bench_func() function. The other methods are skipped. That's done internally by perf.

You may want to skip messages in worker processes. You can do that using:

args = runner.parse_args()
if not args.worker:
   # master process
   print("...")

Maybe I should explain that better in the documentation.

parasyte commented 6 years ago

Nah, it isn't flooding stdout. Only about 100 messages are printed. The problem is "needlessly" running setup code for each sample. And there are at least 60 samples for each benchmark. 30 seconds 60 samples 28 benchmarks = way too much time to effectively benchmark code that reports a mean less than 30ms with standard deviation of 1ms.

I don't want to skip printing messages, I want to "skip" my setup code by reusing existing data structures in memory.

pawarren commented 5 years ago

Is there a solution for this? My setup code takes ~20 seconds, and it looks like every single spawned process runs the setup code again instead of using the objects the parent process has already loaded into memory.

pawarren commented 5 years ago

EDIT: This does not fix the actual original issue. Every single one of my spawned workers still does the setup code, which slows things down a lot. However, I've significantly shortened the amount of time running a single benchmark takes, because each worker no longer does the entire setup code - just what's needed to go from scratch to a specified benchmark.


I found and fixed the problem.

My working loop looks like this:

runner = perf.Runner(values=3, warmups=1, processes=20)

  for path in PATHS_TO_IMAGES:
    filename = path.stem
    image = cv2.imread(str(path)) # a numpy array

    benchmark_name = f'letterbox_resize_{LIBRARY}_{filename}'
    fn = partial(letterbox_resize, image)
    benchmark = runner.bench_func(benchmark_name, fn)

    if benchmark is not None:
      benchmark.dump(f'results/{benchmark_name}.json', compact=False, replace=True)
      if runner.args is not None:
        if runner.args.worker:
          break

I am looping over a list of paths to images. I load an image and benchmark some operations on the image. This benchmark function spawns many child processes. Those processes benchmark the function call as expected...and then continue onwards, fulfilling the rest of the loop and loading every single image in my list.

They don't actually do anything with those images; Runner seems to know that that process has already benchmarked its thing and doesn't call bench_func again. But it wastes a lot of time loading a bunch of images and doing nothing with them. .02s to load an image 1,000 images many processes * many iterations = a lot of wasted time.

The solution was using the runner.args.worker to check if I was in a worker process and if so, breaking out of the loop. But you runner.args is only present if you're in a worker process, so first you check if runner.args is not None. Then you check if runner.args.worker is True. Then because I was dumping results to a JSON file, I had to do the dump call here, right before breaking out of the loop. Any other placement of the .dump() call errors out.

vstinner commented 5 years ago

We can consider to add a new parameter to bench_func() to register a "setup function".

In the meanwhile, you can work around the issue by writing a new script per benchmark no? If you want to get a single JSON file, you can use --append rather than --output.

tyteen4a03 commented 2 years ago

Any updates on this? I'm scratching my head trying to write benchmark codes in pyperf (which includes a bunch of imports)