Epic: Benchmarking and dashboard

awf commented 3 years ago

We would like to be able to measure speed improvements build-over-build in KSC. To do so we need an evaluation framework with these features

[x] I can define a benchmark function foo in python module a/b/d/bar.py. That benchmark function comes with PyTorch reference implementations, and sample inputs. (#645)
[x] There is infrastructure, a script run-benchmarks, which manages
- [ ] running
  - [x] function (#645)
  - [x] gradient (#645)
  - [ ] jacobian
- [ ] correctness checking
  - [x] function (#645)
  - [ ] gradient
  - [ ] jacobian
- [x] timing (#645)
- [ ] plotting
[ ] I can compare results build-over-build as a
- [x] table (#645, function only)
- [x] plot (#645, function only)

awf commented 3 years ago

Proposed looping:

for MODEL in vrelu3, sqrl, ...:
  for TOOL in pt, ptnice, knossos, knossos_cuda, ptcuda, ...:
    with timer:
      MODEL_TOOL_compile()  # -> e.g. nvcc .  Let's keep track of time (RLO -> 100,000 machine hrs?)

    for config in configs:
        for TASK in inference,fwd,bwd:
          with timer:
            MODEL_TOOL_example_prep(x) # -> e.g. todevice (is allowed to recompile) -- need to keep track off overhead here too
          with timer:
            MODEL_TOOL_TASK_example_run(x, N_ITERS)
          MODEL_TOOL_example_done(x) # -> e.g. free gpu memory

cgravill commented 3 years ago

@awf is the idea that this is a sketch? I've added some comments inline

Proposed looping:

for MODEL in vrelu3, sqrl, ...: #CG Currently different runs of benchmarking
  for TOOL in pt, ptnice, knossos, knossos_cuda, ptcuda, ...: #CG handled by probing for method name, then applying processing e.g. TS2K
    with timer:
      MODEL_TOOL_compile()  # -> e.g. nvcc .  Let's keep track of time (RLO -> 100,000 machine hrs?)
      #CG We can add a separate benchmark, but within pytest-benchmark I'm not aware of being able to track timing within steps. We can profile as an option.

    for config in configs:
        for TASK in inference,fwd,bwd: #CG tests for each of these exist
          with timer:
            MODEL_TOOL_example_prep(x) # -> e.g. todevice (is allowed to recompile) -- need to keep track off overhead here too
            #CG similar to above we can benchmark the setup but I'm not aware of being able to time steps
          with timer:
            MODEL_TOOL_TASK_example_run(x, N_ITERS)
          MODEL_TOOL_example_done(x) # -> e.g. free gpu memory
          #CG freeing GPU memory / triggering GC between steps does need some thoughts/investigation

microsoft / knossos-ksc

Epic: Benchmarking and dashboard #725