microsoft / knossos-ksc

Compiler with automatic differentiation
Other
45 stars 10 forks source link

Don't raise exception if we can't achieve min_rounds within duration #907

Closed toelli-msft closed 3 years ago

toelli-msft commented 3 years ago

(Just continue anyway however long it takes)

This PR is part question, part issue, part proposed solution. I don't understand why there's an exception if we can't achieve min_rounds in the given duration. The very next line chooses the greater of rounds and benchmark._min_rounds and it's redundant if the exception is thrown, because if we hit it rounds >= benchmark._min_rounds.

I want to be able to override min_rounds from the command line and this exception is preventing me from doing that. I suggest just removing the exception. Does that make sense?

cgravill commented 3 years ago

So we don't want to run for ever, or an approximation to that. Particularly on unattended machines.

The original code actually behaves how it sounds like you want it to - it'll just run for as long as it takes. This is also why there's redundancy, it's a focused change to the original benchmarking logic.

What I think we want to do is: run for a while, if it can't complete an individual benchmark (method against configuration) in a "reasonable time", then skip that and note. Unfortunately within the tool itself we don't have a clean way to note that benchmarks time out.

I think in the short term what you should do is it increase max_time or reduce min_rounds until it fits.

https://pytest-benchmark.readthedocs.io/en/latest/usage.html#commandline-options

How long are you trying to run for?

There's a work item for the round number: https://msrcambridge.visualstudio.com/Knossos/_backlogs/backlog/Knossos%20Team/Epics/?workitem=19865

toelli-msft commented 3 years ago

For longer-running benchmarks I've found that 1 second is not enough to get accurate estimates because that might be as few as a few hundred rounds, or even tens. Therefore I'd like to run each benchmark for (say) 10,000 rounds or 1 second, whichever is larger. How should I do that? I was hoping I could get the desired result by adding --benchmark-min-rounds=10000 to the command line. However, benchmarks that take longer than 1 second error out with "Duration(...) and min_rounds(10000) can't be completed within max_time(1.0)" (see below).

What should I be doing? It seems that increasing --benchmark-max-time will make some benchmarks run for more than 10,000 rounds. I don't want that! Should I be setting --benchmark-min-time to 1 second?

Example: On master (6095eb21de6702712fc4d1c151dae8b707acca97) if I run

time python3 -m pytest src/bench/ --benchmark-min-rounds=10000 --benchmark-columns=median,mean,iqr,rounds,iterations --benchmark-sort=name --benchmark-group-by=group,func --modulepath=examples/dl-activations/relu3 --benchmarkname=vrelu3 --benchmark-autosave

...
============================================================================================================ test session starts ============================================================================================================
platform linux -- Python 3.8.5, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
benchmark: 3.4.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=10000 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/toelli/knossos-ksc
plugins: bench-0.3.0, benchmark-3.4.1
collected 24 items

src/bench/test_run_bench.py F.F....FFF....F.FFFF....                                                                                                                                                                                  [100%]
Saved benchmark data in: /home/toelli/knossos-ksc/.benchmarks/Linux-CPython-3.8-64bit/0182_6095eb21de6702712fc4d1c151dae8b707acca97_20210701_135424.json

================================================================================================================= FAILURES ==================================================================================================================
__________________________________________________________________________________________ test_inference[vrelu3_pytorch-Knossos-torch.Size([4])] ___________________________________________________________________________________________

benchmark = <pytest_benchmark.fixture.BenchmarkFixture object at 0x7f9fef99baf0>, reference_func = <function vrelu3_pytorch at 0x7f9fbc7d84c0>
func = BenchmarkFunction(name='Knossos', func=<built-in method apply of FunctionMeta object at 0x3fb7440>, device=device(type='cpu')), config = tensor([ 0.9612, -1.0313, -0.9790, -1.0235])

    def test_inference(benchmark, reference_func, func, config):
        config_on_func_device = func.to_device(config)
        with torch.no_grad():
>           result = benchmark_semi_pedantic(
                benchmark, func.func, config_on_func_device
            ).to(cpu_device)

src/bench/test_run_bench.py:11:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

benchmark = <pytest_benchmark.fixture.BenchmarkFixture object at 0x7f9fef99baf0>, function_to_benchmark = <built-in method apply of FunctionMeta object at 0x3fb7440>, setup = None, args = (tensor([ 0.9612, -1.0313, -0.9790, -1.0235]),)
kwargs = {}, make_arguments = <function benchmark_semi_pedantic.<locals>.make_arguments at 0x7f9fef964ca0>, runner_args = (tensor([ 0.9612, -1.0313, -0.9790, -1.0235]),), runner_kwargs = {}
runner = <function BenchmarkFixture._make_runner.<locals>.runner at 0x7f9fefa81f70>, duration = 0.00040600006468594074, iterations = 1, _ = range(0, 1), rounds = 2464

    def benchmark_semi_pedantic(
        benchmark, function_to_benchmark, *args, setup=None, **kwargs
    ):

        has_args = bool(args or kwargs)

        def make_arguments(args=args, kwargs=kwargs):
            if setup:
                maybe_args = setup()
                if maybe_args:
                    if has_args:
                        raise TypeError(
                            "Can't use `args` or `kwargs` if `setup` returns the arguments."
                        )
                    args, kwargs = maybe_args
            return args, kwargs

        runner_args, runner_kwargs = make_arguments()

        # using internal functions to match pytest-benchmark.
        # we might want to just go our own direction in terms of logic
        # https://github.com/ionelmc/pytest-benchmark/blob/996dbe519b5bcc9b103ea0e4aeb232c58b71fc8c/src/pytest_benchmark/fixture.py#L147-L154
        runner = benchmark._make_runner(
            function_to_benchmark, args=runner_args, kwargs=runner_kwargs
        )

        duration, iterations, _ = benchmark._calibrate_timer(runner)

        # Choose how many time we must repeat the test
        rounds = int(ceil(benchmark._max_time / duration))
        if rounds < benchmark._min_rounds:
>           raise Exception(
                f"""Duration({duration}) and min_rounds({benchmark._min_rounds}) can't be completed within max_time({benchmark._max_time})"""
            )
E           Exception: Duration(0.00040600006468594074) and min_rounds(10000) can't be completed within max_time(1.0)
...
------------------------------------------------------------ benchmark 'torch.Size([16]) test_backwards': 2 tests -----------------------------------------------------------
Name (time in us)                                                                                   Median               Mean               IQR            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_backwards[vrelu3_pytorch-Embedded ks_checkpointed_map_handwritten_relu3-torch.Size([16])]     30.6000 (1.0)      33.8685 (1.0)      2.1001 (1.0)       15433           1
test_backwards[vrelu3_pytorch-PyTorch-torch.Size([16])]                                            32.8999 (1.08)     34.8913 (1.03)     2.4999 (1.19)      14642           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------ benchmark 'torch.Size([16]) test_forward': 3 tests -----------------------------------------------------------
Name (time in us)                                                                                 Median               Mean               IQR            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_forward[vrelu3_pytorch-Embedded ks_checkpointed_map-torch.Size([16])]                        8.1002 (1.0)       8.5685 (1.0)      0.6997 (1.0)       14993           1
test_forward[vrelu3_pytorch-Embedded ks_checkpointed_map_handwritten_relu3-torch.Size([16])]      8.1998 (1.01)      9.6886 (1.13)     0.8000 (1.14)      14837           1
test_forward[vrelu3_pytorch-PyTorch-torch.Size([16])]                                            23.5999 (2.91)     27.7398 (3.24)     2.4000 (3.43)      10846           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------- benchmark 'torch.Size([16]) test_inference': 3 tests ----------------------------------------------------------
Name (time in us)                                                                                  Median              Mean               IQR            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_inference[vrelu3_pytorch-Embedded ks_checkpointed_map-torch.Size([16])]                       7.2001 (1.01)     7.6074 (1.01)     0.5998 (1.20)      20326           1
test_inference[vrelu3_pytorch-Embedded ks_checkpointed_map_handwritten_relu3-torch.Size([16])]     7.0999 (1.0)      7.5146 (1.0)      0.4999 (1.0)       17483           1
test_inference[vrelu3_pytorch-Knossos-torch.Size([16])]                                            7.6001 (1.07)     9.7642 (1.30)     1.0000 (2.00)      10616           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------- benchmark 'torch.Size([4]) test_backwards': 2 tests ------------------------------------------------------------
Name (time in us)                                                                                  Median               Mean               IQR            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_backwards[vrelu3_pytorch-Embedded ks_checkpointed_map_handwritten_relu3-torch.Size([4])]     30.3001 (1.0)      31.5083 (1.0)      1.8002 (1.0)       10194           1
test_backwards[vrelu3_pytorch-PyTorch-torch.Size([4])]                                            32.6999 (1.08)     37.0993 (1.18)     2.8000 (1.56)      15016           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------- benchmark 'torch.Size([4]) test_forward': 2 tests -----------------------------------------------------------
Name (time in us)                                                                               Median               Mean               IQR            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_forward[vrelu3_pytorch-Embedded ks_checkpointed_map-torch.Size([4])]                       8.2999 (1.04)     10.5078 (1.06)     1.0000 (1.11)      11014           1
test_forward[vrelu3_pytorch-Embedded ks_checkpointed_map_handwritten_relu3-torch.Size([4])]     8.0001 (1.0)       9.9324 (1.0)      0.8999 (1.0)       15244           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------------------- benchmark 'torch.Size([4]) test_inference': 2 tests ------------------------------------------------------------
Name (time in us)                                                                                  Median               Mean               IQR            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_inference[vrelu3_pytorch-Embedded ks_checkpointed_map_handwritten_relu3-torch.Size([4])]      7.0001 (1.0)       7.4700 (1.0)      0.5998 (1.0)       13794           1
test_inference[vrelu3_pytorch-PyTorch-torch.Size([4])]                                            17.9000 (2.56)     21.0766 (2.82)     1.5998 (2.67)      11849           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
========================================================================================================== short test summary info ==========================================================================================================
FAILED src/bench/test_run_bench.py::test_inference[vrelu3_pytorch-Knossos-torch.Size([4])] - Exception: Duration(0.00040600006468594074) and min_rounds(10000) can't be completed within max_time(1.0)
FAILED src/bench/test_run_bench.py::test_inference[vrelu3_pytorch-Embedded ks_checkpointed_map-torch.Size([4])] - Exception: Duration(0.00010189996100962162) and min_rounds(10000) can't be completed within max_time(1.0)
FAILED src/bench/test_run_bench.py::test_inference[vrelu3_pytorch-PyTorch-torch.Size([16])] - Exception: Duration(0.0001767000649124384) and min_rounds(10000) can't be completed within max_time(1.0)
FAILED src/bench/test_run_bench.py::test_forward[vrelu3_pytorch-Knossos-torch.Size([4])] - Exception: Duration(0.00016980012878775597) and min_rounds(10000) can't be completed within max_time(1.0)
FAILED src/bench/test_run_bench.py::test_forward[vrelu3_pytorch-Knossos-torch.Size([16])] - Exception: Duration(0.00013209995813667774) and min_rounds(10000) can't be completed within max_time(1.0)
FAILED src/bench/test_run_bench.py::test_forward[vrelu3_pytorch-PyTorch-torch.Size([4])] - Exception: Duration(0.0001642000861465931) and min_rounds(10000) can't be completed within max_time(1.0)
FAILED src/bench/test_run_bench.py::test_backwards[vrelu3_pytorch-Knossos-torch.Size([4])] - Exception: Duration(0.000698799965903163) and min_rounds(10000) can't be completed within max_time(1.0)
FAILED src/bench/test_run_bench.py::test_backwards[vrelu3_pytorch-Knossos-torch.Size([16])] - Exception: Duration(0.0002180999144911766) and min_rounds(10000) can't be completed within max_time(1.0)
FAILED src/bench/test_run_bench.py::test_backwards[vrelu3_pytorch-Embedded ks_checkpointed_map-torch.Size([4])] - Exception: Duration(0.00014970009215176105) and min_rounds(10000) can't be completed within max_time(1.0)
FAILED src/bench/test_run_bench.py::test_backwards[vrelu3_pytorch-Embedded ks_checkpointed_map-torch.Size([16])] - Exception: Duration(0.00015329988673329353) and min_rounds(10000) can't be completed within max_time(1.0)
================================================================================================ 10 failed, 14 passed, 10 warnings in 8.69s =================================================================================================
ksc.utils.generate_cpp_from_ks: Deleting /tmp/tmp5b8dqkc2.ks /tmp/tmphrrauoe0.cpp /tmp/tmp8qpc6d5h.kso
ksc.utils.generate_cpp_from_ks: Deleting /tmp/tmpf1zj06t3.ks /tmp/tmpthx14sf3.cpp /tmp/tmp2386zlv0.kso
ksc.utils.generate_cpp_from_ks: Deleting /tmp/tmpqozz7pxz.ks /tmp/tmppig_yf1g.cpp /tmp/tmpvh9kg6lk.kso
python3 -m pytest src/bench/ --benchmark-min-rounds=10000        76.93s user 4.05s system 101% cpu 1:19.90 total
cgravill commented 3 years ago

...Therefore I'd like to run each benchmark for (say) 10,000 rounds or 1 second, whichever is larger. How should I do that?...

It's not possible to achieve that with the crude limit I put in place.

It's also not possible with the tool running in it's default mode e.g.

if you set min_rounds to 10,000, and that is longer than 1 second, it'll take as long as it takes. if you don't set min_rounds to 10,000, then it'll do the rounds it can in 1 second

(there are various caveats in terms of timer precision and it deciding to do more or less)

Given the range of our benchmarks, I think we might have to pass in our own configuration - I think their configuration settings are poorly thought through.

In the short term, what we could do is change the check so that if the "probe round" time is greater than the max_time throw the exception (reworded) instead of the probe time * min_rounds. That would mitigate super long runs, but allow you to do something closer to what you very reasonably wish.

cgravill commented 3 years ago

By the way, on master max_time has already been moved up to 5.0 https://github.com/microsoft/knossos-ksc/blob/6095eb21de6702712fc4d1c151dae8b707acca97/src/bench/run-all-pytest-bench.sh#L2

toelli-msft commented 3 years ago

...Therefore I'd like to run each benchmark for (say) 10,000 rounds or 1 second, whichever is larger. How should I do that?...

It's not possible to achieve that with the crude limit I put in place.

Fair enough. A workaround for this would be nice but it's not super urgent.