Closed ngimel closed 1 year ago
What hardware is the regression observed on?
A100, cuda 11.6, my cudnn is a bit old, 8.3 I think, is it better with the new one?
@williamwen42 can you look at adding absolute latency numbers in the dashboard?
Fix for the inductor time here: https://github.com/pytorch/pytorch/pull/88534
For convnext, cudnn v8 APIs break something big time, with cudnn v8 off I'm getting eager time of 0.189 for the following command
TORCH_CUDNN_V8_API_DISABLED=1 python benchmarks/dynamo/timm_models.py --training --inductor --only convnext_base --devices=cuda --float16 --batch_size 128 --performance --disable-cudagraphs
with cudnn v8 on eager time is 0.52!
and this is not fixed even when I set torch.backends.cudnn.benchmark=True
.
@eqy I think we have to revert this PR.
Sure, will look into this regression as well. Previous CoAtNet regression has already been forwarded to cuDNN.
https://github.com/pytorch/pytorch/pull/88699 seems to also address the CoAtNet regression.
https://github.com/pytorch/pytorch/pull/87650 regressed inductor time 0.118->0.14 (1.6x -> 1.35x) https://github.com/pytorch/pytorch/pull/87669 regressed inductor and eager time (eager time 0.19->0.22, inductor time 0.14->0.17) cc @eellison, @eqy command line to repro is
python benchmarks/dynamo/timm_models.py --training --performance --device cuda --inductor --float32 --only=coat_lite_mini
@anijain2305 are we reporting absolute times on the dashboard already? Those would be very useful. @desertfire we really should start some perf testing in CI to avoid such regressions. It possibly would be noisy, but it should be possible to catch these relatively large changes