mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.58k stars 549 forks source link

Single Stage Detection with MLCube™ [request for feedback] #465

Closed sergey-serebryakov closed 1 year ago

sergey-serebryakov commented 3 years ago

Updates

Known problems

Introduction

MLCommons™ Best Practices WG is working towards simplifying the process of running ML workloads, including MLCommons reference training and inference benchmarks. We have developed a prototype of a library that we call MLCube™.

The goal of this PR is to show how MLCube can be used to run MLCommons training and inference workloads, and to gather a feedback.

Vision

This does not work now! We need to do a new MLCube release. This section describes our vision. Next section (Current implementation) shows the working example.

One possible way of interacting with MLCubes is presented in this section. To simplify the process of running ML models, users need to know the following:

Install MLCube:

virtualenv -p python3 ./mlcube_env
source ./mlcube_env/bin/activate
pip install mlcube

Get the MLCommons SSD reference benchmark:

mlcube pull https://github.com/mlcommons/training --project single_stage_detector
cd ./single_stage_detector

Explore what tasks SSD MLCube supports:

mlcube describe

Run SSD benchmark using local Docker runtime:

# Download SSD dataset (~20 GB, ~40 GB space required)
mlcube run --task download_data --platform docker

# Download ResNet34 feature extractor
mlcube run --task download_model --platform docker

# Run benchmark
mlcube run --task train --platform docker

Current implementation

We'll be updating this section as we merge MLCube PRs and make new MLCube releases.

# Create Python environment 
virtualenv -p python3 ./env && source ./env/bin/activate

# Install MLCube and MLCube docker runner from GitHub repository (normally, users will just run `pip install mlcube mlcube_docker`)
git clone https://github.com/mlcommons/mlcube && cd ./mlcube
cd ./mlcube && python setup.py bdist_wheel  && pip install --force-reinstall ./dist/mlcube-* && cd ..
cd ./runners/mlcube_docker && python setup.py bdist_wheel  && pip install --force-reinstall --no-deps ./dist/mlcube_docker-* && cd ../../..

# Fetch the SSD workload
git clone https://github.com/mlcommons/training && cd ./training
git fetch origin pull/465/head:feature/mlcube-ssd && git checkout feature/mlcube-ssd
cd ./single_stage_detector

# Build MLCube docker image. We'll find a better way of integrating existing workloads
# with MLCube, so that MLCube runs this by itself (it can actually do it now, but in order
# to enable this, we would have to introduce more changes to the SSD repo).
docker build --build-arg http_proxy="${http_proxy}" --build-arg https_proxy="${https_proxy}" . -t mlcommons/train_ssd:0.0.1 -f Dockerfile.mlcube

# Show tasks implemented in this MLCube.
cd ./mlcube && mlcube describe

# Download SSD dataset (~20 GB, ~40 GB space required). Default paths = ./workspace/cache and ./workspace/data
# To override them, use --cache_dir=CACHE_DIR and --data_dir=DATA_DIR
mlcube run --task download_data --platform docker

# Download ResNet34 feature extractor. Default path = ./workspace/data
# To override, use: --data_dir=DATA_DIR
mlcube run --task download_model --platform docker

# Run benchmark. Default paths = ./workspace/data
# Parameters to override: --data_dir=DATA_DIR, --pretrained_backbone=PATH_TO_RESNET3_WEIGHTS, --parameters_file=PATH_TO_TRAINING_PARAMS
mlcube run --task train --platform docker
github-actions[bot] commented 3 years ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

matthew-frank commented 1 year ago

This PR refers to the old, retired, ssd-v1 benchmark, which was replaced by the Retinanet benchmark.

In an effort to do a better job maintaining this repo, we're closing PRs for retired benchmarks. The old benchmark code still exists, but has been moved to https://github.com/mlcommons/training/tree/master/retired_benchmarks/ssd-v1/.

If you think there is useful cleanup to be done to the retired_benchmarks subtree, please submit a new PR.