davidjurado commented 2 years ago

Benchmark execution with MLCube

Project setup

# Create Python environment and install MLCube Docker runner 
virtualenv -p python3 ./env && source ./env/bin/activate && pip install mlcube-docker

# Fetch the image segmentation workload
git clone https://github.com/mlcommons/training && cd ./training
git fetch origin pull/510/head:feature/mlcube_recommendation && git checkout feature/mlcube_recommendation
cd ./recommendation/mlcube

Dataset

The MovieLens dataset will be downloaded and processed. Sizes of the dataset in each step:

Dataset Step	MLCube Task	Format	Size
Download (raw dataset)	download_data	.tar	~3.1 GB
Extract (extracted dataset)	download_data	*.npz	~3.1 GB
Total	(After all tasks)	All	~6.2 GB

Tasks execution

# Download KiTS19 dataset. Default path = mlcube/workspace/data
# To override it, use data_dir=DATA_DIR
mlcube run --task download_data

# Preprocess KiTS19 dataset
# It will use a subdirectory from the DATA_DIR path defined in the previous step
mlcube run --task preprocess_data

# Run benchmark. Default paths input_dir = mlcube/workspace/processed_data
# Parameters to override: input_dir=DATA_DIR, output_dir=OUTPUT_DIR, parameters_file=PATH_TO_TRAINING_PARAMS
mlcube run --task train

We are targeting pull-type installation, so MLCube images should be available on docker hub. If not, try this:

mlcube run ... -Pdocker.build_strategy=always

github-actions[bot] commented 2 years ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

johntran-nv commented 1 year ago

@davidjurado I'm tempted to close this since we're going to replace DLRM in v3.0. Is that ok with you?

matthew-frank commented 1 year ago

This seems to be a modification to the long-retired NCF benchmark, rather than the current DLRM version of the recommendation benchmark.

In an effort to do a better job maintaining this repo, we're closing PRs for retired benchmarks. The old benchmark code still exists, but has been moved to https://github.com/mlcommons/training/tree/master/retired_benchmarks/ncf.

If you think there is useful cleanup to be done to the retired_benchmarks subtree, please submit a new PR.

mlcommons / training

MLCube: Recommendation benchmark #510

Benchmark execution with MLCube

Project setup

Dataset

Tasks execution