ronghanghu / snmn

Code release for Hu et al., Explainable Neural Computation via Stack Neural Module Networks. in ECCV, 2018
http://ronghanghu.com/snmn/
BSD 2-Clause "Simplified" License
72 stars 6 forks source link

FileNotFoundError: snmn/exp_clevr_snmn/data/resnet101_c4/train/CLEVR_train_000000.npy' #1

Closed monajalal closed 5 years ago

monajalal commented 5 years ago

I followed all the steps and got an error here. Do you know how I could fix it?

[jalal@goku snmn]$ python exp_clevr_snmn/train_net_vqa.py --cfg exp_clevr_snmn/cfgs/vqa_gt_layout.yaml
/scratch/sjn/anaconda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
2018-09-28 15:59:46.158093: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-28 15:59:46.348097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:05:00.0
totalMemory: 10.92GiB freeMemory: 5.96GiB
2018-09-28 15:59:46.348160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
Loading imdb from ./exp_clevr_snmn/data/imdb/imdb_train.npy
Done
imdb does not contain bounding boxes
Traceback (most recent call last):
  File "exp_clevr_snmn/train_net_vqa.py", line 33, in <module>
    vocab_layout_file=cfg.VOCAB_LAYOUT_FILE, T_decoder=cfg.MODEL.T_CTRL)
  File "/snmn/util/clevr_train/data_reader.py", line 136, in __init__
    self.batch_loader = BatchLoaderClevr(self.imdb, self.data_params)
  File "/snmn/util/clevr_train/data_reader.py", line 45, in __init__
    feats = np.load(self.imdb[0]['feature_path'])
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/numpy/lib/npyio.py", line 370, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/snmn/exp_clevr_snmn/data/resnet101_c4/train/CLEVR_train_000000.npy'
ronghanghu commented 5 years ago

Hi, did you run the feature extraction step?

cd ./exp_clevr_snmn/data/
python extract_resnet101_c4.py

If yes, is there any error from running this file?

Also, could you check whether the CLEVR dataset correctly linked (so that there are files like exp_clevr_snmn/clevr_dataset/images/train/CLEVR_train_000000.png)

monajalal commented 5 years ago

yes I did that step and got that error

[jalal@goku snmn]$ ls exp_clevr_snmn/clevr_dataset/images/train/CLEVR_train_000000.png exp_clevr_snmn/clevr_dataset/images/train/CLEVR_train_000000.png

ronghanghu commented 5 years ago

Hmm.. Somehow the absolute paths of the feature files are not correct. From the error above (/snmn/exp_clevr_snmn/data/resnet101_c4/train/CLEVR_train_000000.npy), it seems that os.path.abspath in https://github.com/ronghanghu/snmn/blob/master/exp_clevr_snmn/data/build_clevr_imdb.py#L18 thinks that the repo is in the root of the file system (/snmn), which shouldn't happen in most cases.

If the following file exists exp_clevr_snmn/data/resnet101_c4/train/CLEVR_train_000000.npy, then maybe you can try changing the line https://github.com/ronghanghu/snmn/blob/master/exp_clevr_snmn/data/build_clevr_imdb.py#L18 from

    abs_feature_dir = os.path.abspath(feature_dir % image_set)

to

    abs_feature_dir = './exp_clevr_snmn/data/resnet101_c4/%s/' % image_set

And re-run build_clevr_imdb.py to see if it solves the problem.

monajalal commented 5 years ago

thank you. I changed line 18 to what you mentioned above and got this new error. Should I use a very specific version of tools?


[jalal@goku snmn]$ vi exp_clevr_snmn/data/build_clevr_imdb.py 
[jalal@goku snmn]$ python exp_clevr_snmn/train_net_vqa.py --cfg exp_clevr_snmn/cfgs/vqa_gt_layout.yaml
/scratch/sjn/anaconda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
2018-09-28 17:56:01.104008: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-28 17:56:01.267412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:05:00.0
totalMemory: 10.92GiB freeMemory: 6.08GiB
2018-09-28 17:56:01.267476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
Loading imdb from ./exp_clevr_snmn/data/imdb/imdb_train.npy
Done
imdb does not contain bounding boxes
Traceback (most recent call last):
  File "exp_clevr_snmn/train_net_vqa.py", line 45, in <module>
    num_choices=num_choices, module_names=module_names, is_training=True)
  File "/scratch2/nlp_cs591/assignment1/snmn/models_clevr_snmn/model.py", line 33, in __init__
    lstm_seq, q_encoding, embed_seq, seq_length_batch, num_module)
  File "/scratch2/nlp_cs591/assignment1/snmn/models_clevr_snmn/controller.py", line 85, in __init__
    module_prob = tf.nn.softmax(module_logit, axis=1)
TypeError: softmax() got an unexpected keyword argument 'axis'
monajalal commented 5 years ago

can you please a requirements.txt file?

I upgraded tf and ended up with another problem


[jalal@goku snmn]$ pip install --upgrade tensorflow 
Collecting tensorflow
  Downloading https://files.pythonhosted.org/packages/ce/d5/38cd4543401708e64c9ee6afa664b936860f4630dd93a49ab863f9998cd2/tensorflow-1.11.0-cp36-cp36m-manylinux1_x86_64.whl (63.0MB)
    100% |████████████████████████████████| 63.0MB 46kB/s 
Collecting keras-applications>=1.0.5 (from tensorflow)
  Downloading https://files.pythonhosted.org/packages/3f/c4/2ff40221029f7098d58f8d7fb99b97e8100f3293f9856f0fb5834bef100b/Keras_Applications-1.0.6-py2.py3-none-any.whl (44kB)
    100% |████████████████████████████████| 51kB 631kB/s 
Requirement already satisfied, skipping upgrade: six>=1.10.0 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorflow) (1.11.0)
Collecting tensorboard<1.12.0,>=1.11.0 (from tensorflow)
  Downloading https://files.pythonhosted.org/packages/9b/2f/4d788919b1feef04624d63ed6ea45a49d1d1c834199ec50716edb5d310f4/tensorboard-1.11.0-py3-none-any.whl (3.0MB)
    100% |████████████████████████████████| 3.0MB 300kB/s 
Requirement already satisfied, skipping upgrade: gast>=0.2.0 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorflow) (0.2.0)
Requirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorflow) (1.1.0)
Requirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorflow) (1.10.0)
Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorflow) (1.14.2)
Requirement already satisfied, skipping upgrade: astor>=0.6.0 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorflow) (0.6.2)
Collecting keras-preprocessing>=1.0.3 (from tensorflow)
  Downloading https://files.pythonhosted.org/packages/fc/94/74e0fa783d3fc07e41715973435dd051ca89c550881b3454233c39c73e69/Keras_Preprocessing-1.0.5-py2.py3-none-any.whl
Requirement already satisfied, skipping upgrade: absl-py>=0.1.6 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorflow) (0.1.10)
Requirement already satisfied, skipping upgrade: setuptools<=39.1.0 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorflow) (39.0.1)
Collecting protobuf>=3.6.0 (from tensorflow)
  Downloading https://files.pythonhosted.org/packages/c2/f9/28787754923612ca9bfdffc588daa05580ed70698add063a5629d1a4209d/protobuf-3.6.1-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
    100% |████████████████████████████████| 1.1MB 305kB/s 
Requirement already satisfied, skipping upgrade: wheel>=0.26 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorflow) (0.31.0)
Requirement already satisfied, skipping upgrade: h5py in /scratch/sjn/anaconda/lib/python3.6/site-packages (from keras-applications>=1.0.5->tensorflow) (2.7.1)
Requirement already satisfied, skipping upgrade: werkzeug>=0.11.10 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorboard<1.12.0,>=1.11.0->tensorflow) (0.14.1)
Requirement already satisfied, skipping upgrade: markdown>=2.6.8 in /scratch/sjn/anaconda/lib/python3.6/site-packages (from tensorboard<1.12.0,>=1.11.0->tensorflow) (2.6.11)
tensorflow-tensorboard 0.4.0 has requirement html5lib==0.9999999, but you'll have html5lib 1.0b8 which is incompatible.
Installing collected packages: keras-applications, protobuf, tensorboard, keras-preprocessing, tensorflow
  Found existing installation: protobuf 3.5.2.post1
    Uninstalling protobuf-3.5.2.post1:
      Successfully uninstalled protobuf-3.5.2.post1
  Found existing installation: tensorboard 1.7.0
    Uninstalling tensorboard-1.7.0:
      Successfully uninstalled tensorboard-1.7.0
Successfully installed keras-applications-1.0.6 keras-preprocessing-1.0.5 protobuf-3.6.1 tensorboard-1.11.0 tensorflow-1.11.0
[jalal@goku snmn]$ python exp_clevr_snmn/train_net_vqa.py --cfg exp_clevr_snmn/cfgs/vqa_gt_layout.yaml
Traceback (most recent call last):
  File "exp_clevr_snmn/train_net_vqa.py", line 4, in <module>
    import tensorflow as tf
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/__init__.py", line 22, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/python/__init__.py", line 81, in <module>
    from tensorflow.python import keras
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/__init__.py", line 24, in <module>
    from tensorflow.python.keras import activations
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/activations/__init__.py", line 22, in <module>
    from tensorflow.python.keras._impl.keras.activations import elu
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/__init__.py", line 21, in <module>
    from tensorflow.python.keras._impl.keras import activations
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/activations.py", line 23, in <module>
    from tensorflow.python.keras._impl.keras import backend as K
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py", line 36, in <module>
    from tensorflow.python.layers import base as tf_base_layers
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 25, in <module>
    from tensorflow.python.keras.engine import base_layer
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/engine/__init__.py", line 23, in <module>
    from tensorflow.python.keras.engine.base_layer import InputSpec
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 34, in <module>
    from tensorflow.python.keras import backend
  File "/scratch/sjn/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/backend/__init__.py", line 22, in <module>
    from tensorflow.python.keras._impl.keras.backend import abs
ImportError: cannot import name 'abs'
ronghanghu commented 5 years ago

This seems to be a protobuf issue that occurs when updating TensorFlow. This appeared to be fixable with the following steps: uninstalling tensorflow uninstalling protobuf reinstalling tensorlfow (which should come along with the correct protobuf version. (from https://github.com/tensorflow/probability/issues/46#issuecomment-390683203)

I used TensorFlow 1.5.0 in my experiments, which can be installed with pip install tensorflow-gpu==1.5.0

monajalal commented 5 years ago

so I installed CUDA 9.0, tf-gpu 1.5 and updated numpy and reinstalled protobuf. I get this new error. I did everything from scratch for sanity checking. Can you please guide how to fix?

[jalal@goku snmn]$ python exp_clevr_snmn/train_net_vqa.py --cfg exp_clevr_snmn/cfgs/vqa_gt_layout.yaml
/scratch/sjn/anaconda/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2018-10-01 19:21:19.013989: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-01 19:21:19.225787: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:05:00.0
totalMemory: 10.92GiB freeMemory: 10.38GiB
2018-10-01 19:21:19.225868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
Loading imdb from ./exp_clevr_snmn/data/imdb/imdb_train.npy
Done
imdb does not contain bounding boxes
2018-10-01 19:21:53.156519: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7104 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-10-01 19:21:53.157918: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted
$ python -c "import tensorflow as tf; print(tf.__version__)"
/scratch/sjn/anaconda/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
1.5.0
$ cat /usr/local/cuda/version.txt
CUDA Version 9.0.176
monajalal commented 5 years ago

What version of CuDNN are you using?

[jalal@goku snmn]$ export PYTHONPATH=.:$PYTHONPATH
[jalal@goku snmn]$ python exp_clevr_snmn/train_net_vqa.py --cfg exp_clevr_snmn/cfgs/vqa_gt_layout.yaml
/scratch/sjn/anaconda/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2018-10-02 14:47:18.080624: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-10-02 14:47:18.280280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:05:00.0
totalMemory: 10.92GiB freeMemory: 10.05GiB
2018-10-02 14:47:18.280375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
Loading imdb from ./exp_clevr_snmn/data/imdb/imdb_train.npy
Done
imdb does not contain bounding boxes
2018-10-02 14:47:48.502526: E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7104 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000).  If using a binary install, upgrade your CuDNN library to match.  If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
2018-10-02 14:47:48.503460: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted
[jalal@goku snmn]$ ls -l /usr/local/cuda/include/cudnn*
-r--r--r--. 1 root root 100962 Oct  2 07:30 /usr/local/cuda/include/cudnn.h
[jalal@goku snmn]$  ls -l /usr/local/cuda/lib64/libcudnn*
lrwxrwxrwx. 1 root root        13 Oct  2 07:31 /usr/local/cuda/lib64/libcudnn.so -> libcudnn.so.7
lrwxrwxrwx. 1 root root        17 Oct  2 07:31 /usr/local/cuda/lib64/libcudnn.so.7 -> libcudnn.so.7.3.0
-rwxr-xr-x. 1 root root 298233616 Oct  2 07:30 /usr/local/cuda/lib64/libcudnn.so.7.3.0
-rw-r--r--. 1 root root 298484726 Oct  2 07:31 /usr/local/cuda/lib64/libcudnn_static.a
monajalal commented 5 years ago

CuDNN 7.0.5, with CUDA 9 and TF-GPU 1.5 worked for me.

$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"