gfursin commented 2 years ago

Motivation

This project aims at decomposing MLPerf inference benchmarking into a database of reusable, portable, customizable and deterministic scripts with a unified CLI, common Python API and extensible JSON/YAML meta descriptions using the 2nd generation of the CK framework.

The first goal is to simplify the development of this benchmark, make it easier to extend and run it across continuously changing ML tasks, models, data sets, engines, software and hardware, and automate all the manual steps of the submission process.

The second goal is to enable automatic and continuous design space exploration of ML systems across all ML tasks, models, data set, engines, libraries and platforms based on MLPerf loadgen, and selection of Pareto-optimal configurations based on user constraints (latency, throughput, accuracy, energy, model size, memory usage, device cost, etc).

The third goal is to show researchers and engineers that it is possible to reuse portable ML scripts (to detect, download and install models, data sets, engines, libraries, tools) in their own research projects to avoid reinventing the wheel and use the solid MLPerf benchmarking methodology.

Technology

This project is based on the CK2 automation framework and on our practical experience reproducing 150+ ML and Systems papers and automating MLPerf inference submissions:

CM framework (the 2nd generation of the CK framework aka CK2) is used to organize ML projects as a database of reusable and portable components (tasks, models, datasets, engines, libraries, hardware descriptions): GitHub, motivation paper.
CM automation called "script" is used to wrap native scripts with a unified CLI, Python API and JSON/YAML meta descriptions with a unique ID, list of tags, dependency on other CM scripts and any other information required to make any ad-hoc script reusable, portable, customizable and deterministic: Python automation code
CM scripts to automate detection, download, installation and pre/post-processing of all ML artifacts required to run any ML task on any platform natively or inside containers (models, data sets, engines, libraries, tools ...): Github with current scripts (under community development)

See CM tutorials to learn more about reusable CM scripts and CM database format for ML projects.

This is a part of our CM (CK2) roadmap development for 2022

People

Developers

Arjun Suresh (OctoML / MLCommons)
Thomas Zhu (Oxford University / OctoML intern)
Grigori Fursin (OctoML / cTuning foundation / MLCommons)

Feedback

David Kanter
Peter Mattson
Vijay Janapa Reddi
Thierry Moreau (@tmoreau89)
Please add yourself or get in touch if you would like to provide your feedback!

Tasks and timeline

Q3 2022

[x] Develop and stabilize CM core to treat R&D projects as a database of components and automations
[x] Develop CM scripts to detect and/or install all ML artifacts (platform description, OS scripts, ML frameworks, models, data sets, libraries, pre-/post-processing scripts, benchmarks, etc): List
[ ] Replicate CK-based MLPerf inference v1.1 submission with out-of-the-box image classification with imagenet, ONNX and some cloud platform using CM scripts: original study
- [x] Prepare GCP n2-standard-80 platform that we used for previous submission
- [x] Test out-of-the-box CK2(CM) workflow to run image classification cm run script --tags=app,image-classification,onnx,python --quiet
- [x] Organize SSH access to @arjunsuresh and @hanwenzhu .
- [x] Convert outdated CK automation to prepare and submit MLPerf inference benchmark results to CM scripts: GitHub with CK automations
- [ ] Prepare dummy for the reproducibility report similar to above
- [ ] Describe how to install CK with minimal system deps
- [ ] Add CM scripts to detect host and target platforms
- [x] Add CM scripts to install system dependencies
- [x] Add CM scripts to prepare Python virtual env
- [x] Add CM scripts to activate Python virtual env
- [x] Add CM scripts to get and build MLPerf inference src
- [x] Add CM scripts to get and build MLPerf loadgen
- [x] Add CM scripts to get MLPerf inference v2.1 submission repo
- [x] Add CM scripts to detect/install ImageNet
- [x] Add CM scripts to install RESNET50
- [x] Add CM scripts to run Offline image classification scenario
- [x] Accuracy
- [x] Performance
- [ ] Add CM scripts to describe submitter
- [x] Add CM scripts to describe platform
- [x] Add CM scripts to run full benchmark
- [x] Add CM scripts to validate submission
- [x] Add CM scripts to truncate results
- [x] Add CM scripts to pack results
- [x] Add CM scripts to run 3 other scenarios
[ ] Replicate CK-based MLPerf inference v1.1 submission with out-of-the-box image classification with imagenet, TVM and some cloud platform using CM scripts: original study
- [x] Add/test CM script to build LLVM with dependencies
- [x] Add/test CM script to build DNNL
- [x] Add/test CM script to build TVM with required LLVM and DNNL
- [x] Add/test TVM backend to MLPerf image classification
- [ ] Reproduce results from MLPerf v1.1 and test with MLPerf v2.1
[x] Prepare test submission to MLPerf inference v2.1 to evaluate the use of the CM (CK2) automation framework for the MLPerf inference benchmark
[ ] Convert outdated CK-based MLPerf inference benchmark v1.1 automations (developed during Thomas Zhu's first internship last year) to the CM: CK program templates -> CM scripts, CK MLPerf program workflows (mlperf-inference-bench-*) into CM scripts
[ ] Prepare CK2(CM)-based tutorial to modularize and automate MLPerf: dummy
[ ] Remove outdated CK tutorials and CK2 (CM) based tutorials to MLPerf inference docs: GitHub
[x] Archive legacy CK in "ck1" directory and move "cm" to the root
[ ] Prepare universal ML benchmarking with loadgen and CM scripts for different models, data sets, engines, platforms: GitHub issue

Q4 2022 / Q1 2023

[ ] Get the feedback from the MLCommons community and organize collaborative developments with interested members using our MLPerf education workgroup
[ ] Implement CM-based "experiment" automation (record/reproduce/compare/visualize)
[ ] Implement CM-based MLPerf Design Space Explorer (including NAS)
[ ] Convert all MLPerf inference reference benchmarks to CM scripts
[ ] Generate containers and MLCubes for CM-based MLPerf inference reference benchmarks

gfursin commented 2 years ago

We prepared a demo submission for MLPerf inference v2.1 to show that we can automate all steps of the MLPerf submission. We will continue community developments and plan the next release in September.

gfursin commented 2 years ago

Following successful validation of CK2 for MLPerf at Student Cluster Competition, we close this ticket and follow the new roadmap here: https://github.com/mlcommons/ck/issues/536

mlcommons / ck

[MLPerf project] modularize MLPerf inference benchmark and automate submission #261

Motivation

Technology

People

Developers

Feedback

Tasks and timeline

Q3 2022

Q4 2022 / Q1 2023