talmo / leap

LEAP is now deprecated -- check out its successor SLEAP!
https://sleap.ai
Apache License 2.0
206 stars 48 forks source link

need help with installation (cuDNN issue) #16

Closed kbakhurin closed 5 years ago

kbakhurin commented 5 years ago

Hi Talmo,

I am trying to set up LEAP on a new machine. I'm SUPER close, but I am having trouble after the fast training step. The issue seems to do with the cuDNN.. I've attached the output. Do you have any idea what it could be or what I could try? I downloaded cuDNN v7.0.5 , and copied the files into the correct folders, but it doesn't seem to be enough.

name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645 pciBusID: 0000:65:00.0 totalMemory: 11.00GiB freeMemory: 9.08GiB 2019-02-21 12:37:32.191564: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1312] Adding visible gpu devices: 0 2019-02-21 12:37:35.326578: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8794 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) 2019-02-21 12:37:36.966472: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED 2019-02-21 12:37:36.967063: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:389] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows 2019-02-21 12:37:36.968154: F C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

Thank you!!

Konstantin

p.s. you can close that previous issue that is still open.

talmo commented 5 years ago

Hey Konstantin,

This error can happen under several circumstances but the one I've seen most often is due to having an older GPU driver version. Try updating it from the NVIDIA website and giving it another go.

Talmo

kbakhurin commented 5 years ago

Hi Talmo,

Thanks. I updated the GPU driver and now I have a new error.

name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645 pciBusID: 0000:65:00.0 totalMemory: 11.00GiB freeMemory: 9.11GiB 2019-02-24 16:19:04.258147: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1312] Adding visible gpu devices: 0 2019-02-24 16:21:06.767107: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8818 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) 2019-02-24 16:21:10.888156: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:378] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7003 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration. 2019-02-24 16:21:10.890164: F C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

From reading around on the internet, it seems like it is using a 7.1 version of CUDNN. The library that I have recently downloaded (and whose files I copied into the CUDA folders) is version 7.0.5.

Is it OK that I upgraded to the newest driver for my GPU, but am trying to use CUDA v9?

Thanks, Konstantin

talmo commented 5 years ago

Hey Konstantin,

Sorry for the delay -- did you figure this out? The exact versions I've confirmed to work are:

CUDA 9.0:

cuDNN v7.3.1 (Sept 28, 2018), for CUDA 9.0:

Let me know if you're still having issues.

Talmo