Open rezanazari opened 4 years ago
I figured out the the performance degradation is not due to using static links. The issue that I observe now is that taking backward of the jacobian in super fast on P100 (~20 seconds in my test program), but in V100, it takes much longer (~300 seconds). These times are ~40 seconds in prebuild pytorch which are reasonable.
I appreciate if someone can provide instructions on how the libtorch binaries are created.
I am wondering if there is any docker image that can be used for building standard libtorch binaries. I am using nvidia/cuda:10.2-cudnn8-devel-centos7 with gcc7.4, but the performance of my build is much slower compared to the prebuilt libtorch.
Also, when using the static links,
export TH_BINARY_BUILD=1 export USE_STATIC_CUDNN=1 export USE_STATIC_NCCL=1 export ATEN_STATIC_CUDA=1 export USE_CUDA_STATIC_LINK=1
, I see further performance degradation. That would be great if you could provide a docker image in order to reproduce the standard binaries.
Thanks