Closed Charlie-zhang1406 closed 3 years ago
I find the problem exits when apex is doing his thing, when I disable apex and make it compute in fp32, the problem is solved. I still do not know the reason. maybe apex have some version request about cuDNN. I will probably follow this issue uintil I find out the reason.
Thanks for trying out the code! I am glad that you at least got it working without automatic mixed-precision, which should be perfectly fine if you have enough GPU memory (it will not significantly affect the results or compatibility).
I could not reproduce this issue unfortunately, I am using CUDA 10.1 and CuDNN 8.0.3 As a sanity check, make sure you got NVIDIA Apex installed properly (with CUDA extensions) as mentioned in its README.
thank you for your reply, i have reinstalled Apex and it worked.
Awesome, glad it worked for you!
sorry to bother you, but I run into this problem and can not to find a way to fix it. it happens when I train the base virtex model. I have update the cuDNN version into 8.0.3, the former version is 7.6.5. both version have this error.