mlcommons / training_results_v0.6

This repository contains the results and code for the MLPerf™ Training v0.6 benchmark.
https://mlcommons.org/en/training-normal-06/
Apache License 2.0
42 stars 52 forks source link

NVIDIA datasets broken/unavailable #16

Open blue-orc opened 4 years ago

blue-orc commented 4 years ago

I'm trying to run this series of benchmarks under the NVIDIA folder, but running into a lot of issues trying to acquire and set up these datasets properly.

The COCO dataset links here are all broken: https://github.com/mlperf/training_results_v0.6/blob/master/NVIDIA/benchmarks/maskrcnn/implementations/download_dataset.sh

I was able to download the dataset from http://cocodataset.org/ but I'm not sure where to get the weights file.

Also the imagenet dataset for the resnet benchmark is unavailable for direct download. I was able to acquire the dataset, but ran into issues when running the actual training test. The error happened at line 163 of this file: https://github.com/mlperf/training_results_v0.6/blob/master/NVIDIA/benchmarks/resnet/implementations/mxnet/train_imagenet.py#L163

I didn't copy the error but it said that there was an issue with file mapping, my guess is because I don't have it setup exactly how it was supposed to be set up because I've had to try to piece the dataset together.

Is there any updated way to acquire the exact datasets and ensure they are consistent with the published run results?

blue-orc commented 4 years ago

Also, directory structure for minigo bucket has changed: https://console.cloud.google.com/storage/browser/minigo-pub/ml_perf/?pli=1

The provided configuration expects that the checkpoint can be found at ml_perf/checkpoint/9, whereas /ml_perf/0.6/checkpoint seems to be the correct location.

dumaaan commented 4 years ago

I second this issue. I tried to run maskrcnn implementation, but I couldn't get the weights file from anywhere.

dumaaan commented 4 years ago

I solved this issue by updating the link for download_weights.sh file:

try using https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl instead of the link in the original script.