Open zou3519 opened 1 year ago
On A100s, seeing 48ms to 50ms, ~4% regression
On AWS V100s, I'm seeing 53ms on 0.1.1 50ms on 0.2.1 52ms on 1.13
~4% regression from 0.2.1
I redid the experiment with actual V100s on the FAIR cluster, numbers are 75ms (0.1.1) -> 74 ms (0.2.1) 72ms (1.13) which is not a regression.
On that note I'm curious why the V100's on different systems have different performance -- maybe a difference in the CPUs? (Or CUDA version? My experiments were done with the pytorch cuda 10.2 binaries)
To repro:
On my machine with a ~v100~ P100 GPU, the runtime gos from 72ms to 83ms