Following the successful testing of CM end-to-end benchmarking and submission workflow for modular MLPerf benchmarks at the Student Cluster Competition at SuperComputing'22, we have prepared a new list of pending tasks for the MLCommons taskforce on education and reproducibility. The goal is to help the community automate their MLPerf submissions for MLPerf v3.0 and continue modularizing ML Systems and automating their benchmarking, optimization and design space exploration:

Community discussions (see the notes from weekly conf-calls)

[x] #581
[ ] Discuss how to automate iterative/autotuning experiments and collaborative design space exploration using CM meta-framework
- [ ] Discuss how to record all the provenance during experiments (dependencies and their versions)
- [x] Discuss how to visualize all past MLPerf results as well as results from CM experiments during optimization
- [ ] Discuss how to report a table of tested/working/failed combinations of ML tasks,models,engines,datasets, OS, CPU and other deps
- [ ] Discuss how to reproduce best performance/accuracy results from closed/open submissions via CM
- [ ] Discuss how to create a universal performance benchmark with CM and loadgen to plug in ANY model (without accuracy - sync with Guenther)
[x] (GF) Recreate/reuse CK mailing list for the taskforce (as suggested by users)
[x] (GF) Provide update about CM automation, SCC experience and the next steps (DSE) to MLPerf inference WG and our taskforce
[x] (GF) Prepare CM automation presentation for MedPerf WG (20221212)
[ ] Discuss universal, modular, portable and reproducible benchmarking (interest from MLCommons mobile and general community)
- [ ] Discuss universal benchmarking with mobile MLPerf WG
- [x] Add DSE and NAS to CM MLPerf workflow

Finish testing our end-to-end CM MLPerf submission workflow (small dataset)

RetinaNet

[x] (GF) C++ MLPerf with RetinaNet FP32, ONNX and CPU (test and document)
[x] (GF) Ref Python MLPerf with RetinaNet FP32, ONNX and CUDA (check that works with both CPU and GPU)
[x] (AS) Ref Python MLPerf with RetinaNet FP32, PyTorch and CPU (should work with older torchvision and num_threads=1)
[x] (GF) C++ MLPerf with RetinaNet FP32, ONNX and CUDA (test and document)
[x] Update tutorial and add a stable hash
[x] Update run-mlperf-inference-app README
[x] Update modular Docker with the above app
[x] Create a reproducibility matrix with all tested or failed choices (see https://github.com/mlcommons/ck/tree/master/cm-mlops/script/run-mlperf-inference-app)
[x] #563

ResNet50

[x] (GF) Ref Python MLPerf with ResNet50 FP32, ONNX and CPU (test and document - check GitHub action)
[x] (AS) Ref Python MLPerf with ResNet50 FP32, TVM and CPU (test and document - build stable TVM)
[x] Ref Python MLPerf with ResNet50 FP32, ONNX and CUDA
[x] Ref Python MLPerf with ResNet50 FP32, PyTorch and CPU
[x] Ref Python MLPerf with ResNet50 FP32, TF and CPU
[x] C++ MLPerf with ResNet50 FP32, ONNX and CPU
[x] C++ MLPerf with ResNet50 FP32, ONNX and CUDA
[ ] C++ MLPerf with ResNet50 Int8, ONNX and CUDA
[ ] C++ MLPerf with ResNet50 FP32, PyTorch and CUDA
[ ] C++ MLPerf with ResNet50 Int8, PyTorch and CUDA

Compare C++ implementation with best performance (need to validate):

INT8: offline: 40000 images/sec -> CPU
INT8: offline: 16000 images/sec -> CPU 96-core PyTorch
FP32: offline: 40 images/sec -> CPU 2..4 cores OnnxRuntime

BERT

[x] Ref Python MLPerf with BERT FP32, ONNX and CPU
[x] Ref Python MLPerf with BERT FP32, Tensorflow and CPU
[x] Ref Python MLPerf with BERT FP32, Pytorch and CPU
[x] Ref Python MLPerf with BERT INT8, ONNX and CPU
[x] Ref Python MLPerf with BERT FP32, ONNX and CUDA
[x] Ref Python MLPerf with BERT FP32, Tensorflow and CUDA
[x] Ref Python MLPerf with BERT FP32, Pytorch and CUDA
[x] Ref Python MLPerf with BERT INT8, ONNX and CUDA

All other reference MLPerf implementations

[x] Add to CM all other rerference MLPerf implementations (maybe with the help of the community and students)
[x] Add tests of all reference MLPerf implementations to the MLPerf inference repo: https://github.com/mlcommons/inference/tree/master/.github/workflows
[ ] Add modular Docker containers for all reference MLPerf implementations:
- [x] Prototype1
- [x] Prototype 2

Test and document how to run and tune other MLPerf scenarios

[x] SingleStream
[x] MultipleStream
[x] Server

Add Power measurements to the CM MLPerf workflow

[x] #559
[x] Document how to measure power
[x] Turn on power daemon, collect results and unify output in the CM workflow

Finish testing our end-to-end MLPerf submission workflow (full dataset)

[x] Python MLPerf with RetinaNet FP32, ONNX and CPU (test and document)
[x] Python MLPerf with ResNet50 FP32, ONNX and CPU (test and document)
[x] Python MLPerf with BERT FP32, ONNX and CPU (test and document)

Design Space Exploration and testing

[ ] Automate exploration and testing of all design choices of ML Systems using CM and MLPerf with the help of the community. Record all dependency versions for the dashboard including versions of engines, compilers, pytrochvision, etc ..

Misc

[x] #537
[x] #555
[x] Add new Docker containers for all MLPerf inference examples from SCC tutorial
[x] (GF) Check get-tvm building with CUDA and test with image classification example
[x] (AS) Check how to sign MLCommons power agreement and access power repo

Documentation

[x] Remove outdated CK notes from MLPerf inference repository
[ ] Update READMEs for app-mlperf-inference, app-mlperf-inference-cpp, run-mlperf-inference-app (including tutorial for SCC'22); add API with all the optimizaiton/DSE dimensions!
[ ] Add links to above READMEs to the MLPerf inference repository
[ ] Add extension projects (including for students)
Update main CM documentation:
- [x] Basics (including CLI + all objects as DB + Python API)
- [x] Scripts (latest API + flow);
- [ ] https://docs.google.com/spreadsheets/d/1T5-c7Eb8CfUxgwYgxQND-UsWWJY-I4lDNqK5DZM9uV4/edit#gid=0
- [x] image classification tutorial
- [x] MLPerf tutorial
- [ ] Docker
[ ] (GF+AS) Create MLPerf inference SCC end-to-end MLPerf video tutorial
[ ] (GF+AS) Video tutorial about CM

Add non-reference (optimized) implementations

[x] NVidia MLPerf with RetinaNet FP32, ONNX and CPU (test and document)
- [x] https://github.com/mlcommons/inference_results_v2.1/issues/6
- [x] sync with Ethan Cheng about modularization
[ ] NVidia MLPerf with ResNet50 FP32, ONNX and CPU (test and document)
[ ] NVidia MLPerf with BERT FP32, ONNX and CPU (test and document)
[ ] TFLite with MobileNets (reproduce open division submissions using CM)
[x] NeuralMagic implementation with pruning (arrange a hackathon)
[ ] Qualcomm AI100 implementation with quantization
[ ] Intel implementation

Improve testing and documentation of individual CM scripts:

[x] automatically generate README.md from meta and "docs" directory with manually prepared READMEs?
[ ] Add tests/matrix.yaml for CMD tests?
[ ] Add stable dockerfiles ?

Add support for Android

[x] CM script to detect Android SDK
[x] CM script to detect Android NDK
[ ] CM script to build and run simple app on Android (image corner detection)
[ ] Discuss universal benchmarking with mobile MLPerf WG

Enhancement projects (ideas)

[ ] Work with the community to reproduce MLPerf inference v2.1 submissions, modularize them using CM, add them to our universal benchmarking workflow with a modular Docker container and automate submission of Pareto-optimal results to MLPerf inference v3.0
[ ] Add CM support for Android benchmarking (collaborate with MLPerf mobile WG)
[ ] C++ MLPerf with RetinaNet FP32, PyTorch and CPU (add backend and optimize)
[ ] C++ MLPerf with RetinaNet FP32, TVM and CPU (add backend and optimize)
[ ] Ref Python MLPerf with RetinaNet FP32, TVM and CPU (optimize)
[ ] Create fun app with web cam and object detection with MLPerf RetinaNet and CM
[ ] Discuss projects with VJ and MLPerf research WG

Upcoming presentations

[x] MLCommons MedPerf WG

mlcommons / ck

Roadmap for CM, MLPerf and ML/SW/HW DSE: 20221116 #536