Single Stage Detection with MLCube™ [request for feedback]

sergey-serebryakov commented 3 years ago

Updates

06/05/2021-02 Adding --force-reinstall switch for pip install command in the step-by-step guide below.
06/05/2021-01 Fixing bug: "docker image exists" check now uses docker command specified in docker platform file. In previous version, the docker command was hard coded for this check.
05/05/2021-01 Adding missing dependency to MLCube™ docker file (unzip).
02/05/2021-01 Fixed errors in Current implementation section related to installing MLCube from GitHub repository.
01/05/2021-01 Vision section below now clearly states it's not a working example.
22/04/2021-01 All pending MLCube PRs have been merged into master.

Known problems

05/05/2021 User environment needs sudo to run docker containers. A quick fix could be to replace command: docker with command: sudo docker in docker.yaml.

Introduction

MLCommons™ Best Practices WG is working towards simplifying the process of running ML workloads, including MLCommons reference training and inference benchmarks. We have developed a prototype of a library that we call MLCube™.

MLCube GitHub repository
MLCube wiki

The goal of this PR is to show how MLCube can be used to run MLCommons training and inference workloads, and to gather a feedback.

Vision

This does not work now! We need to do a new MLCube release. This section describes our vision. Next section (Current implementation) shows the working example.

One possible way of interacting with MLCubes is presented in this section. To simplify the process of running ML models, users need to know the following:

They need to be aware about MLCube.
They need to know how to install it.
They need to know that they can run mlcube describe in a MLCube directory.

Install MLCube:

virtualenv -p python3 ./mlcube_env
source ./mlcube_env/bin/activate
pip install mlcube

Get the MLCommons SSD reference benchmark:

mlcube pull https://github.com/mlcommons/training --project single_stage_detector
cd ./single_stage_detector

Explore what tasks SSD MLCube supports:

mlcube describe

Run SSD benchmark using local Docker runtime:

# Download SSD dataset (~20 GB, ~40 GB space required)
mlcube run --task download_data --platform docker

# Download ResNet34 feature extractor
mlcube run --task download_model --platform docker

# Run benchmark
mlcube run --task train --platform docker

Current implementation

We'll be updating this section as we merge MLCube PRs and make new MLCube releases.

# Create Python environment 
virtualenv -p python3 ./env && source ./env/bin/activate

# Install MLCube and MLCube docker runner from GitHub repository (normally, users will just run `pip install mlcube mlcube_docker`)
git clone https://github.com/mlcommons/mlcube && cd ./mlcube
cd ./mlcube && python setup.py bdist_wheel  && pip install --force-reinstall ./dist/mlcube-* && cd ..
cd ./runners/mlcube_docker && python setup.py bdist_wheel  && pip install --force-reinstall --no-deps ./dist/mlcube_docker-* && cd ../../..

# Fetch the SSD workload
git clone https://github.com/mlcommons/training && cd ./training
git fetch origin pull/465/head:feature/mlcube-ssd && git checkout feature/mlcube-ssd
cd ./single_stage_detector

# Build MLCube docker image. We'll find a better way of integrating existing workloads
# with MLCube, so that MLCube runs this by itself (it can actually do it now, but in order
# to enable this, we would have to introduce more changes to the SSD repo).
docker build --build-arg http_proxy="${http_proxy}" --build-arg https_proxy="${https_proxy}" . -t mlcommons/train_ssd:0.0.1 -f Dockerfile.mlcube

# Show tasks implemented in this MLCube.
cd ./mlcube && mlcube describe

# Download SSD dataset (~20 GB, ~40 GB space required). Default paths = ./workspace/cache and ./workspace/data
# To override them, use --cache_dir=CACHE_DIR and --data_dir=DATA_DIR
mlcube run --task download_data --platform docker

# Download ResNet34 feature extractor. Default path = ./workspace/data
# To override, use: --data_dir=DATA_DIR
mlcube run --task download_model --platform docker

# Run benchmark. Default paths = ./workspace/data
# Parameters to override: --data_dir=DATA_DIR, --pretrained_backbone=PATH_TO_RESNET3_WEIGHTS, --parameters_file=PATH_TO_TRAINING_PARAMS
mlcube run --task train --platform docker

github-actions[bot] commented 3 years ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

matthew-frank commented 1 year ago

This PR refers to the old, retired, ssd-v1 benchmark, which was replaced by the Retinanet benchmark.

In an effort to do a better job maintaining this repo, we're closing PRs for retired benchmarks. The old benchmark code still exists, but has been moved to https://github.com/mlcommons/training/tree/master/retired_benchmarks/ssd-v1/.

If you think there is useful cleanup to be done to the retired_benchmarks subtree, please submit a new PR.

mlcommons / training