psf / pyperf

Toolkit to run Python benchmarks
http://pyperf.readthedocs.io/
MIT License
797 stars 78 forks source link

How to programmatically get the output for timeit() or bench_func()? #167

Closed rayluo closed 11 months ago

rayluo commented 1 year ago

Hi @vstinner , thanks for this PyPerf project. Presumably because of its sophisticated architecture, a command line pyperf timeit -s "...define my_func..." "my_func()" is able to print a human-readable output to the terminal, such as "Mean +- std dev: 38.3 us +- 2.8 us". And I love that its std dev is noticeably smaller than some other benchmark tools, and its mean is more consistent.

Now, how do I programmatically get that reliable mean value? I tried the following experiments, but could not get what I want.

import pyperf
runner = pyperf.Runner()
return_value = runner.timeit("Times a function", stmt="locals()")
print(return_value)  # This is always None

benchmark = runner.bench_func("bench_func", locals)
print(benchmark)
if benchmark:  # This check is somehow necessary, probably due to the multiprocess architecture
    print(benchmark.get_values())
    # It is still unclear how to get benchmark.mean()
    # It throws exception: statistics.StatisticsError: mean requires at least one data point

BTW, I suspect #165 was for my same use case.

rayluo commented 1 year ago

I did more experiment, which brought me further, but still ended up a dead end.

import pyperf

runner = pyperf.Runner()  # "Only once instance of Runner must be created. Use the same instance to run all benchmarks."

def timeit(stmt, *args):
    """It will spawn 20+ subprocesses. The main process returns (time, stdev).
    Otherwise subprocesses return None.  Do NOT run any workload on None code path.

    :param Callable stmt: stmt can be a callable.
    :param args: Positional arguments for the stmt callable.
    """
    name = getattr(stmt, "__name__", str(stmt))  # TODO: The str() could end up with
        # different addresses for the same function in different subprocesses, though
    benchmark = runner.bench_func(name + str(args), stmt, *args)
    if benchmark and benchmark.get_nrun() > 1:  # Then all sub-processes finished
        # PyPerf will already show the mean and stdev on stdout
        return benchmark.median()  # Or we could return mean()
    # Unfortunately, sub-processes will still expose None results. Caller needs to somehow ignore them.

if __name__ == "__main__":
    print("Expensive setup")
    result = timeit(globals)
    if result:
        print(result)

In the snippet above, I can get the time for the test subject (global() in my case).

But the line "expensive setup" is also being printed 20+ times. This makes it useless in a bigger project that needs expensive setup.

@vstinner, is PyPerf meant to support those programmatic use cases?

bluenote10 commented 12 months ago

I was wondering the same and found a solution.

Note that the pyperf architecture seems to be based on re-spawing the script multiple times for the worker processes. This can be seen by printing sys.argv in the script, and you will see that the whole script gets executed many times, and the --worker and --worker-task=<index> arguments are the way pyperf decided what to do exactly in each invocation.

For this reason I would avoid trying to do anything inside the benchmark script itself, because every side effect (like printing) will be executed many times. Instead I would run the entire benchmark via a subprocess.call and pass-in -o some_dump_path. This allows you to then load the written dump file in your main process, which itself isn't subject to re-running.

To illustrate, I'm using something like this in my actual benchmark suite:

def main():
    # In my case I have a bunch of benchmark "script snippets" in another folder,
    # which contain the actual benchmark code, e.g., some call like:
    # pyperf.Runner().bench_time_func(name, func)
    bench_files = [p for p in (Path(__file__).parent / "benchmarks").glob("*.py")]

    for bench_file in bench_files:
        name = bench_file.stem
        print(f"Benchmarking: {name}")

        dump_path = Path(f"/tmp/bench_results/{name}.json")
        dump_path.parent.mkdir(exist_ok=True, parents=True)
        dump_path.unlink(missing_ok=True)

        subprocess.check_call(
            ["python", bench_file, "-o", dump_path], cwd=bench_file.parent
        )

        with dump_path.open() as f:
            benchmarks = pyperf.BenchmarkSuite.load(f).get_benchmarks()

        # now you can programmatically read the benchmark results here...
vstinner commented 12 months ago

Once you have a BenchmarkSuite class, you can use the documented API:

Would you mind to elaborate your question?

Examples of code loading JSON files: https://pyperf.readthedocs.io/en/latest/examples.html#hist-scipy-script

rayluo commented 12 months ago

Would you mind to elaborate your question?

It seems that PyPerf's multi-process architecture determines that PyPerf's usage pattern is command-line as input, json file as output. This means, if we want to programmatically run multiple test cases and analysis their results, it cannot be done inside a benchmark script. Kudos to @bluenote10 who found a feasible approach to organize such a multi-benchmark project by one main driver script. Overall, this seems difficult to be incorporated into an existing Pytest-powered test suite.

Shameless advertising: I ended up developing perf_baseline, which is a thin wrapper built on top of Python's timeit, whose accuracy is adequate. I also added some handy behaviors that I needed for my "perf regression detection" project, and it fits my need well.

vstinner commented 11 months ago

I proposed multiple times to have an option to disable fork. Results may be less reliable, but apparently, using fork is causing troubles and pyperf cannot be used in some cases. But so far, nobody really asked for that feature, so it wasn't implemented.

It seems that PyPerf's multi-process architecture determines that PyPerf's usage pattern is command-line as input, json file as output. This means, if we want to programmatically run multiple test cases and analysis their results, it cannot be done inside a benchmark script.

You can write a second script which runs the benchmark suite and analyze them.

rayluo commented 11 months ago

Fair enough. Closing this issue, because we have a workaround (and I have an alternative).

vstinner commented 11 months ago

As I wrote, I would be fine with an option to not spawn worker processes, but run all benchmarks in a single process.

The main process running all benchmark worker process gets Benchmark objects, it's already part of the API ;-)

rayluo commented 11 months ago

I proposed multiple times to have an option to disable fork. Results may be less reliable, but apparently, using fork is causing troubles and pyperf cannot be used in some cases. But so far, nobody really asked for that feature, so it wasn't implemented.

To your point, I suppose we do not need to change PyPerf's multi-process (i.e. fork) nature, especially when that architecture is considered the reason of being more reliable.

What some people needed, at least initially, is an old-school function-style, timeit-like, api, such as output = func(input). So, perhaps, PyPerf can provide a higher level api to wrap all those subprocess.call() and return the content of the json output.

vstinner commented 11 months ago

To your point, I suppose we do not need to change PyPerf's multi-process (i.e. fork) nature, especially when that architecture is considered the reason of being more reliable.

In terms of API, maybe pyperf can provide an API which runs a process which is the main process, and this one spawns worker process. The API should just return objects directory. So hide the inner complexity.

But here I'm talking about an API which does everything in the whole process.

I'm not sure if it always matter to spawn worker processes. "It depends" :-) It's the beauty of benchmarking.