pytorch / benchmark

TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
BSD 3-Clause "New" or "Revised" License
818 stars 260 forks source link

[test_train[tacotron2-cpu-eager][Test getting passed but not getting perf matrices/data #1770

Closed ghost closed 11 months ago

ghost commented 11 months ago

Problem statement: hi , while running torchbench on cpu for some of the model we don't get the perf matrices even though test showing pass ,

Reproduce steps step1: install the model dependency
python3 install.py --continue_on_fail step2: run below cmd python3 -m pytest test_bench.py -v -k "test_train[tacotron2-cpu-eager]" --ignore_machine_config --cpu_only you will get below message ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.2.0 -- /usr/bin/python3 cachedir: .pytest_cache benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /root/torchbench plugins: hydra-core-1.1.2, benchmark-4.0.0, anyio-3.7.1 collected 375 items / 374 deselected / 1 selected

test_bench.py::TestBenchNetwork::test_train[tacotron2-cpu-eager] PASSED [100%] while if you run the below cmd python3 -m pytest test_bench.py -v -k "test_train[hf_Longformer-cpu-eager]" --ignore_machine_config --cpu_only model got passed along with we have perf matrices , please check the below log platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.2.0 -- /usr/bin/python3 cachedir: .pytest_cache benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /root/torchbench plugins: hydra-core-1.1.2, benchmark-4.0.0, anyio-3.7.1 collected 375 items / 374 deselected / 1 selected

test_bench.py::TestBenchNetwork::test_train[hf_Longformer-cpu-eager] PASSED [100%]

--------------------------------------------------- benchmark 'hub': 1 tests --------------------------------------------------- Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations

test_train[hf_Longformer-cpu-eager] 35.2101 40.8139 37.7402 2.0789 37.1864 2.5295 2;0 0.0265 5 1

Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean

So why some of the passed model having perf number and some of them not ??

xuzhao9 commented 11 months ago

This is because the tacotron2 model doesn't support training on CPU: https://github.com/pytorch/benchmark/blob/main/torchbenchmark/models/tacotron2/__init__.py#L28

ghost commented 11 months ago

okay fine then what about this model test_train[densenet121-cuda-eager] why its not showing perf matrices??

xuzhao9 commented 11 months ago

densenet

Could you please run the command

python run.py densenet121 -d cuda -t train

for debugging?

This model requires large GPU memory resource so it may not work on your local environment

xuzhao9 commented 11 months ago

Closed as there is no update from issue reporter for 2 weeks. Feel free to reopen it if you still have issues.