[Roadmap WIP] Standardize and increase coverage for TorchBench

yanbing-j commented 1 year ago

Motivation

TorchBench is a collection of open-source benchmarks used to evaluate PyTorch performance. It provides a standardized API for benchmark drivers, both for evaluation (eager/jit) and training. Plenty of popular models are involved in TorchBench. Users are convenient to debug and profile.

In order to standardize the performance evluation and increase coverage, TorchBench can be enhanced in the following 3 aspects in CPU:

Fit for typical user scenarios
Well integrate new features of PyTorch
Increase benchmark coverage

Detailed proposal

Fit for typical user scenarios (especially in userbenchmark)

add a new userbenchmark with CPU runtime configuration options, enable those configurations into test.py/run.py also for sanity check or debugging

[x] Add core binding option, may leverage torch launcher
[x] Add gomp/iomp option
[x] Add memory allocator option

support performance metrics in the new CPU userbenchmark

[x] Add throughput: Samples / Total time
[x] Add latency: Total time / samples
[x] Add fps-like report

Well integrate new features of PyTorch

[x] Enable bf16 datatype support both for inference and training
[x] Fully support channels_last both for inference and training
[x] Extend a complier option to support Dynamo
[x] Support JIT tracing and cover more models with JIT support
[x] Enable quantization support

Increase benchmark coverage

Increase model coverage

[ ] Add models from community with popularity (e.g, RNN-T)
[ ] Add models from real customers (Multi-Band MelGAN, ViT and Wav2vec)
[ ] Fix some models not implemented in CPU (e.g, DALLE2_pytorch, moco, pytorch_struct, tacotron2, timm_efficientdet, vision_maskrcnn)
[x] Add typical GNN workloads

Port OpBench to TorchBench

[ ] Increase OpBench coverage
[ ] Complete support of dtypes, memory-format and inplace version for ops

xuzhao9 commented 1 year ago

To better help with different stakeholders, we are migrating away from test_bench.py to the new "userbenchmark" approach. In benchmark/userbenchmark (https://github.com/pytorch/benchmark/tree/main/userbenchmark), we encourage users to develop their customized benchmarks with TorchBench models and use the "run_benchmark.py" driver to drive their benchmark.

Benefits of using TorchBench userbenchmark:

We can decouple benchmarks with their infrastructures, and run different benchmarks on different machines. For example, CPU benchmarks don't need GPU machines, and GPU benchmarks don't need too powerful CPU.
We are also decoupling the benchmark model code with the experiments we are running on them. We believe this design is much clearer and can easily attribute the code to the correct owners.
We can easily support specific userbenchmark on the PyTorch PR level. For example, specify "RUN_TORCHBENCH: " can run a userbenchmark in a PyTorch PR for A/B testing and get result visualization on PyTorch HUD (e.g., https://hud.pytorch.org/userbenchmark_view?url=https:%2F%2Fossci-metrics.s3.amazonaws.com%2Ftorchbench-pr-test%2Fpr84626%2Fresult.csv)
In the future we can also bisect userbenchmark metrics to pinpoint problematic commits.

Therefore, I believe the above section "Fit for typical user scenarios" should be developed as a new TorchBench userbenchmark, instead of modifying test_bench.py.

We are still keeping test.py for unit testing purpose for now.

I am happy to answer any questions about TorchBench userbenchmarks from Intel, please feel free to reach out on Slack, or here at GitHub.

yanbing-j commented 1 year ago

@xuzhao9 Thanks for the information! We will look into userbenchmark and update this roadmap. I have some questions, will userbenchmark replace test_bench completely to guarantee the PR quality in the future? And how about torchbenchmark? Any userbenchmark support CPU at present? Thanks!

xuzhao9 commented 1 year ago

@yanbing-j My answers:

Yes, the plan is to use userbenchmark to replace test_bench.py. However, the PR quality will still be guaranteed with test.py, which is what we are doing right now. We will keep test.py but deprecate test_bench.py.
torchbenchmark describes the benchmark model code, and we won't change that.
The "release-test" (https://github.com/pytorch/benchmark/tree/main/userbenchmark/release-test) userbenchmark tests both CPU and GPU performance on a couple of models in pytorch/examples. We are also working to deliver more CPU userbenchmarks soon. The next one will be measuring the stableness of both CPU and GPU tests across torchbench.

chuanqi129 commented 1 year ago

Hi @xuzhao9, thanks for the future plan sharing, may I know is there any guideline or document to demonstrate how to enable a new benchmark under userbenchmark?

We are also working to deliver more CPU userbenchmarks soon.

And could you also share the roughly timeline for this CPU userbenchmarks deliver plan? And in your perspective, do you want @yanbing-j and I work based on this CPU userbenchmark in nearly future or we can define a new one?

xuzhao9 commented 1 year ago

@chuanqi129 We plan to deliver the first CPU userbenchmark within a month, it will be about the stableness of CPU latency across all TorchBench models. I suggest Intel can work on a their own userbenchmark (a new one).

I created a userbenchmark doc here: https://github.com/pytorch/benchmark/pull/1328

chuanqi129 commented 1 year ago

Thanks @xuzhao9 for the update, I will check it.

xuzhao9 commented 1 year ago

Hi @chuanqi129 , I tried running your userbenchmark on our CI runner, and it failed with error: https://github.com/pytorch/benchmark/actions/runs/4885488240

Also, please let me know about which runner you would like to deploy your benchmark on.

chuanqi129 commented 1 year ago

Hi @chuanqi129 , I tried running your userbenchmark on our CI runner, and it failed with error: https://github.com/pytorch/benchmark/actions/runs/4885488240

Also, please let me know about which runner you would like to deploy your benchmark on.

Thanks @xuzhao9 for your great support for the cpu userbenchmark. I'm sorry about the late reply due to the out of office for labor holiday and AL. I will focus on it recent days and fix the CI runner failures.

Some reply for comments in #1559 as below

For on-demand AWS instances, can you check any of the AWS instances in https://github.com/pytorch/test-infra/blob/main/.github/scale-config.yml can be used? It is the preferred approach.

I have double checked the instance types in Pytorch ci node pool, c5 instance based on 2nd Generation Intel® Xeon® Scalable Processors (CLX), which can support fp32 and int8, but no bf16 datatype support. We can try to use linux.24xlarge instance for initial test.

If not (for example, we can't reach a reasonable low noise level), we can use the AWS metal instance, which is AWS g4dn.metal, with Intel(R) Xeon(R) Platinum 8259CL CPU. Does it support fp32/int8?

The 8259CL also belongs to 2nd Generation Intel® Xeon® Scalable Processors (CLX). So it also support fp32/int8. We can use linux.24xlarge instance firstly, if it has large noise, we can try this metal one later.

BTW, ideally, it will be great if we can deploy the cpu benchmark on c6i.16xlarge instance, because it will same with our dyanmo cpu dashboard used instance. (Nice to have)

xuzhao9 commented 1 year ago

@chuanqi129 I am wondering does the dynamo cpu dashboard work on GitHub Actions? Can I have the GitHub Actions workflow file?

chuanqi129 commented 1 year ago

@chuanqi129 I am wondering does the dynamo cpu dashboard work on GitHub Actions? Can I have the GitHub Actions workflow file?

No, the dynamo cpu dashboard maintained by our side. Attached this test used Dockerfile and scripts. In this test, all needed components are built from source code. And I also think it will be great if we can integrate this dynamo cpu dashboard test into Pytorch Github Action, but it needs c6i instance.

chuanqi129 commented 1 year ago

Hi @xuzhao9 , I have noticed that there are several GPT series models have been added into torchbench, how about adding the GPT-J-6B also? Seems it will be added into mlperf benchmark.

pytorch / benchmark