Possible (-2 to 4%) regression in functorch_dp_cifar10_cuda model from 0.1.1 to latest

pytorch / functorch

functorch is JAX-like composable function transforms for PyTorch.

https://pytorch.org/functorch/

BSD 3-Clause "New" or "Revised" License

1.38k stars 102 forks source link

Possible (-2 to 4%) regression in functorch_dp_cifar10_cuda model from 0.1.1 to latest #1039

Open zou3519 opened 1 year ago

zou3519 commented 1 year ago

To repro:

# setup pytorch/benchmark
git clone https://github.com/pytorch/benchmark
cd benchmark
# this doesn't need to complete successfully -- we just need to install torchbenchmark's basic dependencies.
python setup.py install

python run_benchmark.py functorch

On my machine with a ~v100~ P100 GPU, the runtime gos from 72ms to 83ms

samdow commented 1 year ago

On A100s, seeing 48ms to 50ms, ~4% regression

samdow commented 1 year ago

On AWS V100s, I'm seeing 53ms on 0.1.1 50ms on 0.2.1 52ms on 1.13

~4% regression from 0.2.1

zou3519 commented 1 year ago

I redid the experiment with actual V100s on the FAIR cluster, numbers are 75ms (0.1.1) -> 74 ms (0.2.1) 72ms (1.13) which is not a regression.

On that note I'm curious why the V100's on different systems have different performance -- maybe a difference in the CPUs? (Or CUDA version? My experiments were done with the pytorch cuda 10.2 binaries)