talmo / leap

LEAP is now deprecated -- check out its successor SLEAP!
https://sleap.ai
Apache License 2.0
206 stars 48 forks source link

tensorflow import issue #7

Closed heejaeyunajang closed 6 years ago

heejaeyunajang commented 6 years ago

I got the following error when trying to do fast train. I believe I installed cudnn and tensorflow packages correctly but still not able to troubleshoot... Thanks Talmo!

python "C:\Users\hjang\Desktop\leap\leap\training.py" "C:\Users\hjang\AppData\Local\Temp\tp313aabc9_9c58_4a9b_a52f_d2fa18aa092a.h5" --base-output-path="C:\Users\hjang\Desktop\models" --run-name="180801_143608-n=1120" --net-name="leap_cnn" --filters=32 --rotate-angle=5 --val-size=0.10000 --epochs=15 --batch-size=50 --batches-per-epoch=50 --val-batches-per-epoch=10 --reduce-lr-factor=0.1000000000 --reduce-lr-patience=2 --reduce-lr-cooldown=0 --reduce-lr-min-delta=0.0000100000 --reduce-lr-min-lr=0.0000000001 --upsampling-layers --amsgrad C:\Users\hjang\Anaconda3\lib\site-packages\h5py__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Using TensorFlow backend. Traceback (most recent call last): File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 18, in swig_import_helper return importlib.import_module(mname) File "C:\Users\hjang\Anaconda3\lib\importlib__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 658, in _load_unlocked File "", line 571, in module_from_spec File "", line 922, in create_module File "", line 219, in _call_with_frames_removed ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in from tensorflow.python.pywrap_tensorflow_internal import * File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 21, in _pywrap_tensorflow_internal = swig_import_helper() File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 20, in swig_import_helper return importlib.import_module('_pywrap_tensorflow_internal') File "C:\Users\hjang\Anaconda3\lib\importlib__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ModuleNotFoundError: No module named '_pywrap_tensorflow_internal'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\hjang\Desktop\leap\leap\training.py", line 10, in import keras File "C:\Users\hjang\Anaconda3\lib\site-packages\keras__init.py", line 3, in from . import utils File "C:\Users\hjang\Anaconda3\lib\site-packages\keras\utils__init.py", line 6, in from . import conv_utils File "C:\Users\hjang\Anaconda3\lib\site-packages\keras\utils\conv_utils.py", line 9, in from .. import backend as K File "C:\Users\hjang\Anaconda3\lib\site-packages\keras\backend\init.py", line 84, in from .tensorflow_backend import * File "C:\Users\hjang\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py", line 5, in import tensorflow as tf File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\init.py", line 24, in from tensorflow.python import * File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\python\init__.py", line 49, in from tensorflow.python import pywrap_tensorflow File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 74, in raise ImportError(msg) ImportError: Traceback (most recent call last): File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 18, in swig_import_helper return importlib.import_module(mname) File "C:\Users\hjang\Anaconda3\lib\importlib\init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 994, in _gcd_import File "", line 971, in _find_and_load File "", line 955, in _find_and_load_unlocked File "", line 658, in _load_unlocked File "", line 571, in module_from_spec File "", line 922, in create_module File "", line 219, in _call_with_frames_removed ImportError: DLL load failed: The specified module could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in from tensorflow.python.pywrap_tensorflow_internal import * File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 21, in _pywrap_tensorflow_internal = swig_import_helper() File "C:\Users\hjang\Anaconda3\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 20, in swig_import_helper return importlib.import_module('_pywrap_tensorflow_internal') File "C:\Users\hjang\Anaconda3\lib\importlib__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) ModuleNotFoundError: No module named '_pywrap_tensorflow_internal'

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions. Include the entire stack trace above this error message when asking for help. IdleTimeout has been reached. Parallel pool using the 'local' profile is shutting down.

talmo commented 6 years ago

Hi Heejae!

I'm guessing it still does have to do with CUDA installation. Sorry that it's such a pain... What's most likely the case is that either:

(a) You installed the wrong version of cuDNN that doesn't match with the CUDA Toolkit version or the TensorFlow version. First, check which TensorFlow version you have installed by writing in the commandline: pip show tensorflow-gpu

Here's what I get:

λ pip show tensorflow-gpu
Name: tensorflow-gpu
Version: 1.6.0
Summary: TensorFlow helps the tensors flow
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: opensource@google.com
License: Apache 2.0
Location: /tigress/tdp/anaconda3/lib/python3.6/site-packages
Requires: gast, termcolor, six, protobuf, tensorboard, wheel, absl-py, numpy, astor, grpcio
Required-by:

Then, make sure you access the install instructions that matches the TensorFlow version you have installed (or take your chances and update to the latest by running pip install --upgrade tensorflow-gpu). On the latest docs it says that it supports:

You may need to dig those versions out of the NVIDIA website under the archives.

You can also Google or search the Tensorflow issues for your CUDA/cuDNN version and see if other people are having issues with your particular combination, e.g., search for "cuda 1.9.0" or "cudnn 7.0".

In the end, the easiest solution might just be to downgrade to those versions.

(b) You installed the correct version but TensorFlow can't find it. This is also a common issue. You need to make sure that the CUDA toolkit binaries are in the Windows environment PATH variable. You can check what's in your PATH variable by following these steps:

  1. Open the Start menu

  2. Type "View advanced system settings" and click the result. It should open this window: image

  3. Click Environment Variables...

  4. Under the System variables table (NOT User variables which is above it), find the Path variable: image

  5. Click Edit... and make sure that the following entry is in the list: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin Also make sure that that's the correct folder on your system by seeing if you can open it in Windows Explorer.

  6. Restart MATLAB and try to run test_leap or running this command: !python -c "import tensorflow" If you don't get any errors you should be good to go!

If the CUDA Toolkit/bin folder is in your PATH and it still can't find it, make sure that the CuDNN files are in that folder. Specifically, these files should be present:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin\cudnn64_7.dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include\cudnn.h
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64\cudnn.lib

Give these steps a go and let me know if you're still having issues!

Talmo

heejaeyunajang commented 6 years ago

Thank you Talmo for such a detailed response! :) I think it was a version incompatibility issue!