vcg-uvic / lf-net-release

Code Release for LF-Net: Learning Local Features from Images
Other
314 stars 66 forks source link

Invalid argument error #14

Closed UCRajkumar closed 5 years ago

UCRajkumar commented 5 years ago

When I ran run_lfnet.py as specified in the github, I produce this following error:

Found 1179 images...
  2%|██▊                                                                                                                                                                   | 20/1179 [01:32<1:29:44,  4.65s/it]Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,412,464] = 360 is not in [0, 360)
         [[{{node GatherV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_lfnet.py", line 227, in <module>
    main(config)
  File "run_lfnet.py", line 151, in main
    outs = sess.run(fetch_dict, feed_dict=feed_dict)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,412,464] = 360 is not in [0, 360)
         [[node GatherV2 (defined at /data/userdata/u.rajkumar/lf-net-release/det_tools.py:156) ]]

Caused by op 'GatherV2', defined at:
  File "run_lfnet.py", line 227, in <module>
    main(config)
  File "run_lfnet.py", line 82, in main
    ops = build_networks(config, photo_ph, is_training)
  File "run_lfnet.py", line 55, in build_networks
    degree_maps, _ = get_degree_maps(ori_maps) # degree (rgb psuedo color code)
  File "/data/userdata/u.rajkumar/lf-net-release/det_tools.py", line 156, in get_degree_maps
    degree_maps = tf.gather(angle2rgb, degree_maps[...,0])
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 3273, in gather
    return gen_array_ops.gather_v2(params, indices, axis, name=name)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3748, in gather_v2
    "GatherV2", params=params, indices=indices, axis=axis, name=name)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[0,412,464] = 360 is not in [0, 360)
         [[node GatherV2 (defined at /data/userdata/u.rajkumar/lf-net-release/det_tools.py:156) ]]

Everything seems to be starting to run correctly as the code finds all the images and makes progress 2% of the way. However, it abruptly stops with that following error. The dataset is the sacre_coeur dataset downloaded as is and no modifications have been made to it.

jiangwei221 commented 5 years ago

Did you try to use the docker image? The docker image should provide the complete run time enviroment, just tried on my PC, and it works. This repo requires an old version of tensorflow, tensorflow-gpu==1.4.0, I doubt this is the issue. here is my running log from docker:

(base) root@cbff9cb4a1d3:/home# python run_lfnet.py --in=./sacre_coeur/release/outdoor_examples/images/sacre_coeur/dense/images --out=./test
/opt/conda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
Act-Fn:  <function get_activation_fn.<locals>.<lambda> at 0x7f3e74a5e950>
Apply instance norm on input photos
Scales (0.707107~1.41 #5): [1.41421356 1.18920712 1.         0.84089642 0.70710678]
PAD=16, #conv=8, ksize=5 ori-ksize=5
Act-Fn:  <function relu at 0x7f3e89cc16a8>
===== SimpleDesc (reuse=False) =====
#1 conv-bn-act (?, 16, 16, 64)
#2 conv-bn-act (?, 8, 8, 128)
#3 conv-bn-act (?, 4, 4, 256)
FLAT (?, 4096)
Feat-Norm: L2-NORM
FEAT (?, 256)
2019-07-17 22:41:50.073841: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-07-17 22:41:50.192150: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-17 22:41:50.192576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 7.44GiB
2019-07-17 22:41:50.283076: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-17 22:41:50.283506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-07-17 22:41:50.284422: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2019-07-17 22:41:50.284447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 
2019-07-17 22:41:50.284455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0:   Y Y 
2019-07-17 22:41:50.284460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1:   Y Y 
2019-07-17 22:41:50.284468: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-07-17 22:41:50.284475: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
Load trained models...
Checkpoint models-latest-42000
[Wed Jul 17 22:41:50 2019] Resuming...
Done.
Found 1179 images...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1179/1179 [03:07<00:00,  6.37it/s]
Done.
(base) root@cbff9cb4a1d3:/home# 
UCRajkumar commented 5 years ago

It works now, thank you!