mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.58k stars 549 forks source link

Add MLCube support for RNN speech recognition #491

Open davidjurado opened 3 years ago

davidjurado commented 3 years ago

Used PR #465 as reference.

Current implementation

We'll be updating this section as we merge MLCube PRs and make new MLCube releases.

Project setup

# Create Python environment and install MLCube Docker runner 
virtualenv -p python3 ./env && source ./env/bin/activate && pip install mlcube-docker

# Fetch the RNN speech recognition workload
git clone https://github.com/mlcommons/training && cd ./training
git fetch origin pull/491/head:feature/rnnt_mlcube && git checkout feature/rnnt_mlcube
cd ./rnn_speech_recognition/mlcube

Dataset

The Librispeech dataset will be downloaded, extracted, and processed. Sizes of the dataset in each step:

Dataset Step MLCube Task Format Size
Download (Compressed dataset) download_data Tar files ~62 GB
Extract (Uncompressed dataset) download_data Flac files ~64 GB
Preprocess (Processed dataset) preprocess_data Wav files ~114 GB
Total (After all tasks) All ~240 GB

Tasks execution

# Download Librispeech dataset. Default path = /workspace/data
# To override it, use data_dir=DATA_DIR
mlcube run --task download_data

# Preprocess Librispeech dataset, this will convert .flac audios to .wav format
# It will use the DATA_DIR path defined in the previous step
mlcube run --task preprocess_data

# Run benchmark. Default paths = ./workspace/data
# Parameters to override: data_dir=DATA_DIR, output_dir=OUTPUT_DIR, parameters_file=PATH_TO_TRAINING_PARAMS
mlcube run --task train

We are targeting pull-type installation, so MLCube images should be available on docker hub. If not, try this:

mlcube run ... -Pdocker.build_strategy=always
github-actions[bot] commented 3 years ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

mwawrzos commented 1 year ago

Hello @davidjurado! I tried to follow the task execution steps, but the last step failed with the following error:

$ mlcube run --task train
Usage: mlcube.py train [OPTIONS]
Try 'mlcube.py train --help' for help.

Error: Missing option '--output_dir'.
2023-05-19 09:35:17 [...]

Your description sais:

# Run benchmark. Default paths = ./workspace/data
# Parameters to override: data_dir=DATA_DIR, output_dir=OUTPUT_DIR, parameters_file=PATH_TO_TRAINING_PARAMS
mlcube run --task train

How to override the output_dir?

nv-rborkar commented 4 months ago

@davidjurado can you answer @mwawrzos 's question. We can merge this accordingly.