Open PoonKinWang opened 4 years ago
Have you tried to run a lot of cases and generate an average value for benchmark purpose? BTW, it would be great if you could provide more code and details for us to reproducing the results.
Have you tried to run a lot of cases and generate an average value for benchmark purpose? BTW, it would be great if you could provide more code and details for us to reproducing the results.
Thanks for reply. The test demo is following:
There's too many moving parts here. To begin, can you please compile 1.3.1 source tree from source with the same cudnn version as the pre-compiled binaries (7.6.4) and compare the result with the precompiled binaries? That would verify that your build environment is ok. Then we can see if there are some regressions introduced by the different cudnn version or pytorch 1.4
Other than the obvious like your compiler and the actual code version being different, I'd say that the third-party libraries used may be different. One massive one is CUDA 10.2, which the prebuilts do not use.
I compile the torch(v1.4.0 cudatoolkit=10.2 cudnn=7.6.5) from source with the command "python setup install". And then load the model of shufflenet v2 0.5 with the compiled library(batchsize=8), the speed is 0.0083. But when I load the same model with the torch(v1.3.1 cudatoolkit=10.0 cudnn=7.6.4) build by official binaries through conda, the speed is 0.0075. Why is this fast? My GPU is 2080TI. Relevant config:
cmakelog.log
cc @ezyang @VitalyFedyunin @ngimel @mruberry