Open hinkeret opened 6 years ago
你好,造成PaddingFIFOQueue的原因一般是你的tfrecord生成错误,首先你可以单独测试一下能不能从tfrecord读出数据,如果不行检查一下tfrecord的路径,还有convert data to tfrecord 文件中有个地方改一下(见图片,两种都尝试一下),重新生成tfrecord。如果还没解决我们再讨论,如果解决了,请告诉我是什么原因,谢谢。 PS. 你发我的邮件我回复不了,已经退信两次了。
谢谢你,现在才知道 --dataset 这个参数是不能随意输入的,会导致tfrecord 的路径不对,就会报这个错误,还是应该把作者的代码看一下才可以
你好,您上面说的那些我都尝试过,tfrecord的路径也没什么问题,我现在有一个问题,就是您在转tfrecord的时候,转出来的文件大小会比所有照片的大小加起来还大很多嘛,我现在就是这个情况,所以我在怀疑是不是在转tfrecord 的时候有什么地方设置的不对。
是会变大,tfrecord我没有过多研究,应该有更好的写法,欢迎改进。参考 @hinkeret
imagenet图片的三通道平均数,做减均值操作有利于加快训练。 @hinkeret
我使用了这样的xml和照片,一共三张照片,一直找不出原因出在哪里?
<folder>VOC2012</folder>
<filename>153.jpg</filename>
<path>C:\Users\test\Desktop\VOC2012\153.jpg</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>1600</width>
<height>1200</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>spm40</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>859</xmin>
<ymin>340</ymin>
<xmax>936</xmax>
<ymax>416</ymax>
</bndbox>
</object>
xml File
如果方便的话,能否麻烦您提供一个极小的训练数据集,我用来验证是环境问题还是训练数据出了差错?
你可以使用voc2007的数据集。
你好,我找到这个错误的原因了,是因为在使用tf.decode_raw解析照片的时候格式需要和在写照片时保持一致,我将tf.float32改成tf.uint8代码就可以正常运行了
May I ask which file you modified? @hinkeret
The file is read_tfrecord.py, you can modify “img = tf.decode_raw(features['img'],tf.float32)”,change tf.float32 into tf.uint8. @frothmoon
抱歉,刚刚没有看清楚,我下载的read_tfrecord.py文件中原来的代码就是img = tf.decode_raw(features['img'],tf.uint8),我的代码还是没有正常运行,还是报错PaddingFIFOQueue,请问有解决办法吗? 非常感谢! @hinkeret
上面作者说的那些问题你都检查过吗?你的环境是Windows还是Linux @frothmoon
检查过了,是linux 请问正常运行时的输出是下面这样的吗
2018-08-31 16:30:08.592256: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2651 get requests, put_count=2134 evicted_count=1000 eviction_rate=0.468604 and unsatisfied allocation rate=0.609959
2018-08-31 16:30:08.592313: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
2018-08-31 16:30:04: step1 image_name:b'Cow_600.jpg' |
rpn_loc_loss:1.1323292255401611 | rpn_cla_loss:1.5864677429199219 | rpn_total_loss:2.718796968460083 |
fast_rcnn_loc_loss:0.14092016220092773 | fast_rcnn_cla_loss:0.5938623547554016 | fast_rcnn_total_loss:0.7347825169563293 |
total_loss:4.092337608337402 | pre_cost_time:4.808664560317993s
@hinkeret
是的 @frothmoon
好奇怪啊,我这里都已经跑了几张图片后报的错
Traceback (most recent call last):
File "train.py", line 229, in <module>
train()
File "train.py", line 196, in train
fast_rcnn_total_loss, total_loss, train_op])
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
Caused by op 'get_batch/batch', defined at:
File "train.py", line 229, in <module>
train()
File "train.py", line 34, in train
is_training=True)
File "../data/io/read_tfrecord.py", line 89, in next_batch
dynamic_pad=True)
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 919, in batch
name=name)
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 716, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 457, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 946, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
@hinkeret
那应该是你的数据有问题 你需要查一查照片 @frothmoon
Recommend improved code: https://github.com/DetectionTeamUCAS/FPN_Tensorflow. @hinkeret @zhangxiaopang88 @frothmoon
好奇怪啊,我这里都已经跑了几张图片后报的错
Traceback (most recent call last): File "train.py", line 229, in <module> train() File "train.py", line 196, in train fast_rcnn_total_loss, total_loss, train_op]) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]] Caused by op 'get_batch/batch', defined at: File "train.py", line 229, in <module> train() File "train.py", line 34, in train is_training=True) File "../data/io/read_tfrecord.py", line 89, in next_batch dynamic_pad=True) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 919, in batch name=name) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 716, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 457, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 946, in _queue_dequeue_many_v2 timeout_ms=timeout_ms, name=name) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__ self._traceback = _extract_stack() OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
@hinkeret
好奇怪啊,我这里都已经跑了几张图片后报的错
Traceback (most recent call last): File "train.py", line 229, in <module> train() File "train.py", line 196, in train fast_rcnn_total_loss, total_loss, train_op]) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]] Caused by op 'get_batch/batch', defined at: File "train.py", line 229, in <module> train() File "train.py", line 34, in train is_training=True) File "../data/io/read_tfrecord.py", line 89, in next_batch dynamic_pad=True) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 919, in batch name=name) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 716, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 457, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 946, in _queue_dequeue_many_v2 timeout_ms=timeout_ms, name=name) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__ self._traceback = _extract_stack() OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
@hinkeret
我也出现了这个问题,最后你是怎么解决的呢
应该是部分数据有问题,导致tfrecord读数据出错 @linden1994
do not use encode() in linux; 'img_name': _bytes_feature(img_name.encode()) use 'img_name': _bytes_feature(img_name)
应该是这个问题,在运行convert_data_to_tfrecord.py时候,--dataset=后面的名字不能随便命名,训练自己的数据情况: 检查: 1.read_tfrecord.py中,68行,if dataset_name not in后面有这个文件名称;比如自己设定为voc_shelf; 2.libs/label_name_dict/label_dict.py中设定自己voc_shelf,并且在里面写入自己类别名称; 3.在libs/configs/cfgs.py中指定DATASET_NAME = 'voc_self' 4.python convert_data_to_tfrecord.py --VOC_dir='../VOC_NWPU/VOCdevkit_train/' --save_name='train' --img_format='.jpg' --dataset='voc_self'即可,训练的时候就不会出错了.
@frothmoon 您好,请问您当时是怎么解决PaddingFIFOQueue问题的?还记得吗?
do not use encode() in linux; 'img_name': _bytes_feature(img_name.encode()) use 'img_name': _bytes_feature(img_name)
Traceback (most recent call last):
File "/home/user1/R2CNN-Plus-Plus_Tensorflow/data/io/convert_data_to_tfrecord.py", line 123, in
你好,造成PaddingFIFOQueue的原因一般是你的tfrecord生成错误,首先你可以单独测试一下能不能从tfrecord读出数据,如果不行检查一下tfrecord的路径,还有convert data to tfrecord 文件中有个地方改一下(见图片,两种都尝试一下),重新生成tfrecord。如果还没解决我们再讨论,如果解决了,请告诉我是什么原因,谢谢。 PS. 你发我的邮件我回复不了,已经退信两次了。
Traceback (most recent call last):
File "/home/user1/R2CNN-Plus-Plus_Tensorflow/data/io/convert_data_to_tfrecord.py", line 123, in
restore model Traceback (most recent call last): File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call return fn(*args) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn target_list, status, run_metadata) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, gradients/range/delta)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "./train.py", line 229, in
train()
File "./train.py", line 196, in train
fast_rcnn_total_loss, total_loss, train_op])
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, gradients/range/delta)]]
Caused by op 'get_batch/batch', defined at: File "./train.py", line 229, in
train()
File "./train.py", line 34, in train
is_training=True)
File "../data/io/read_tfrecord.py", line 89, in next_batch
dynamic_pad=True)
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 989, in batch
name=name)
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 763, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 483, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2430, in _queue_dequeue_many_v2
component_types=component_types, timeout_ms=timeout_ms, name=name)
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, gradients/range/delta)]]