Closed oscarriddle closed 5 years ago
What problems does changing cudnnHandle to UcudnnHandle cause?
Hi @tbennun , @oyamay
After a further investigation, I succeeded to merge the ucudnn to tensorflow.
Tensorflow uses a void to point to cudnnHandle_t object, and create it at the beginning. Every time calling cudnn APIs, Tensorflow firstly static cast this void to cudnnHandle_t object (by function "ToHandle()", cudnn seems provide a method to construct from void*), then use this casted object to call APIs.
While ucudnn provides constructor methods either from void or an Ucudnnhandle_t object. If just change cudnnHandle_t to UcudnnHandle_t, every time static_cast the void to UcudnnHandlet, it will call the constructor by void*. This is not what we expected, because ucudnn does no copy operation in that constructor, hence private member handle will be not constructed. Oops.
I expect the static_cast to call copy constructor, so I replaced the ToHandle implementation to another way to get the UcudnnHandle_t object, and pass this object to every APIs. Maybe I can fix the tensorflow patch and issue a PR later.
The operation conversion only work when cudnnSetStream() and cudnnGetStream() are called, because ucudnn doesn't override them, which shall be fine.
Thanks,
System Info: CentOS7, TF1.6, CUDA9.0, cuDNN7.5.0, TitanXP 12GB
Description: I tried to implement ucudnn to Tensorflow1.6 and noticed the way of declaring and utilizing cudnnHandle_t in Tensorflow caused the problem. In the Tensorflow, the handle of cudnn, which is named as dnnhandle in cuda_dnn.cc, is declared as void* and been reinterpret_cast and created by wrap::cudnnCreate() only once.
Every time the DoConvolveImpl() function is called, it will firstly statically cast the dnnhandle to cudnnHandle_t and use it to call cudnn APIs. But simply changing cudnnHandle_t to UcudnnHandle_t will cause problems.
Temporarily, I used another way to implement it, that instead of using dnnhandle every time calling DoConvolveImpl(), I declare UcudnnHandle_t and create it every time calling the DoConvolveImpl() and destroy it before leaving DoConvolveImpl() (DoBatchNormalizationForwardImpl() as well). This causes a huge performance slide, but tensorflow can normally finish several iterations of session run.
This indicates the way of how to deal with the cudnn handle of tensorflow is the key point.