talmo / leap

LEAP is now deprecated -- check out its successor SLEAP!
https://sleap.ai
Apache License 2.0
206 stars 48 forks source link

cannot find compatible tensorflow device/ cudnn library not match? #17

Closed haofanglee closed 5 years ago

haofanglee commented 5 years ago

Hi! I came across a strange issue today when running the fast train network. The training cannot be initialized after starting parallel pool. And it generated these outputs below:

2019-03-27 23:47:18.653402: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2019-03-27 23:47:18.836497: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1212] Found device 0 with properties: name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71 pciBusID: 0000:01:00.0 totalMemory: 8.00GiB freeMemory: 6.53GiB 2019-03-27 23:47:18.836729: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1312] Adding visible gpu devices: 0 2019-03-27 23:47:19.333976: I C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6295 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5) 2019-03-27 23:47:22.064939: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_dnn.cc:378] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7003 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration. 2019-03-27 23:47:22.066605: F C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)

I've used this program on this computer just fine a couple days ago, and I didn't remember changing anything before it suddenly stopped working. Do you know what may have caused the issue?

I really appreciate your help and look forward to your reply!

talmo commented 5 years ago

Hey there,

That's very odd that it started happening without any changes. It looks like it's a CUDA/CuDNN/Tensorflow/Keras version mismatch.

I've just added some steps for troubleshooting on this page: https://github.com/talmo/leap/wiki/CUDA-Troubleshooting

Give those a go, if they don't work let me know and we'll come up with another strategy.

Talmo

kbakhurin commented 5 years ago

Hey Talmo,

Haofang and I are in the same lab and are both experiencing this problem with the CuDNN not being compatible even though we have downloaded the Version 7.0.5 files from the archive and manually copied them to the right folders. The messages we get are that the system is finding 7.1 versions on the computer.

The troubleshooting guide you wrote up is helpful. Interestingly, when I type the !conda list cuda and !conda list cudnn commands into MATLAB, I get no information back:

image

when I do the same commands in Python, it also has the same issue.

image

should i start by trying to upgrade to tensorflow-1.12 like you suggested in the trouble shooting guide?

kbakhurin commented 5 years ago

Hi Talmo,

Following your instructions in the troubleshooting guide helped. I was able to get it going again on one computer.

We'll keep you posted on the other computer!

BTW just realized that your paper came out in Nature Methods. Congratulations! I hope many people realize how useful and easy to use (once installed, haha)!

talmo commented 5 years ago

Thanks Konstantin! Sorry I missed your early reply (I'm out at a conference right now).

I'll close the issue but feel free to reopen if you run into new problems on the other computer.

Cheers!

Talmo