Open zuowang opened 7 years ago
I'm not sure,but it may be a bug in /lib/fast-rcnn/train,line 95,line 96,wheretf.group
was used, see this answer.I think use tf.dyanmic_partition
will work,but you should modify the anchor_target_layer to generate an mask that feed to tf.dyanmic_partition
,see tensorflow API for more detail.
PS;when running the code,it gives an warning:UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. That usually happens when using tf.group()
.
I had the same issue and I managed to fix it by adding a gpu flag in demo.py
Change:
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
To
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
When I train a model from scratch by running " ./experiments/scripts/faster_rcnn_end2end.sh gpu 0 VGG16 pascal_voc". I run into similar error. Is there anyone what to fix the train.py file?
I too could not get the training to work without running out of resources.
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 392.00MiB. See logs for memory state. W tensorflow/core/framework/op_kernel.cc:965] Internal: Dst tensor is not initialized. E tensorflow/core/common_runtime/executor.cc:390] Executor failed to create kernel. Internal: Dst tensor is not initialized. [[Node: zeros_24 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [25088,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]] Traceback (most recent call last): File "./tools/train_net.py", line 96, in
max_iters=args.max_iters)
File "/home/deepinsight/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 222, in train_net
sw.train_model(sess, max_iters)
File "/home/deepinsight/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 134, in train_model
sess.run(tf.initialize_all_variables())
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: zeros_24 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [25088,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'zeros_24', defined at: File "./tools/train_net.py", line 96, in
max_iters=args.max_iters)
File "/home/deepinsight/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 222, in train_net
sw.train_model(sess, max_iters)
File "/home/deepinsight/Faster-RCNN_TF/tools/../lib/fast_rcnn/train.py", line 131, in train_model
train_op = tf.train.MomentumOptimizer(lr, momentum).minimize(loss, global_step=global_step)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 279, in minimize
name=name)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 393, in apply_gradients
self._create_slots(var_list)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/momentum.py", line 51, in _create_slots
self._zeros_slot(v, "momentum", self._name)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 593, in _zeros_slot
named_slots[var] = slot_creator.create_zeros_slot(var, op_name)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 106, in create_zeros_slot
val = array_ops.zeros(primary.get_shape().as_list(), dtype=dtype)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1362, in zeros
output = constant(zero, shape=shape, dtype=dtype, name=name)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 169, in constant
attrs={"value": tensor_value, "dtype": dtype_value}, name=name).outputs[0]
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/deepinsight/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1128, in init
self._traceback = _extract_stack()
InternalError (see above for traceback): Dst tensor is not initialized. [[Node: zeros_24 = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [25088,4096] values: [0 0 0]...>, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]