yangxue0827 / FPN_Tensorflow

A Tensorflow implementation of FPN detection framework.
416 stars 150 forks source link

It occurs a problem when i train my dataset,thank you in advance! #35

Open hinkeret opened 6 years ago

hinkeret commented 6 years ago

restore model Traceback (most recent call last): File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call return fn(*args) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn target_list, status, run_metadata) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, gradients/range/delta)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./train.py", line 229, in train() File "./train.py", line 196, in train fast_rcnn_total_loss, total_loss, train_op]) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1137, in _run feed_dict_tensor, options, run_metadata) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run options, run_metadata) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, gradients/range/delta)]]

Caused by op 'get_batch/batch', defined at: File "./train.py", line 229, in train() File "./train.py", line 34, in train is_training=True) File "../data/io/read_tfrecord.py", line 89, in next_batch dynamic_pad=True) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 989, in batch name=name) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 763, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 483, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2430, in _queue_dequeue_many_v2 component_types=component_types, timeout_ms=timeout_ms, name=name) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op op_def=op_def) File "/home/emg/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, gradients/range/delta)]]

yangxue0827 commented 6 years ago

你好,造成PaddingFIFOQueue的原因一般是你的tfrecord生成错误,首先你可以单独测试一下能不能从tfrecord读出数据,如果不行检查一下tfrecord的路径,还有convert data to tfrecord 文件中有个地方改一下(见图片,两种都尝试一下),重新生成tfrecord。如果还没解决我们再讨论,如果解决了,请告诉我是什么原因,谢谢。 PS. 你发我的邮件我回复不了,已经退信两次了。 image

zhangxiaopang88 commented 6 years ago

谢谢你,现在才知道 --dataset 这个参数是不能随意输入的,会导致tfrecord 的路径不对,就会报这个错误,还是应该把作者的代码看一下才可以

hinkeret commented 6 years ago

你好,您上面说的那些我都尝试过,tfrecord的路径也没什么问题,我现在有一个问题,就是您在转tfrecord的时候,转出来的文件大小会比所有照片的大小加起来还大很多嘛,我现在就是这个情况,所以我在怀疑是不是在转tfrecord 的时候有什么地方设置的不对。

yangxue0827 commented 6 years ago

是会变大,tfrecord我没有过多研究,应该有更好的写法,欢迎改进。参考 @hinkeret

yangxue0827 commented 6 years ago

imagenet图片的三通道平均数,做减均值操作有利于加快训练。 @hinkeret

hinkeret commented 6 years ago
VOC2012 153.jpg C:\Users\test\Desktop\VOC2012\153.jpg Unknown 1600 1200 3 0 spm40 Unspecified 0 0 859 340 936 416

153

我使用了这样的xml和照片,一共三张照片,一直找不出原因出在哪里?

hinkeret commented 6 years ago
<folder>VOC2012</folder>
<filename>153.jpg</filename>
<path>C:\Users\test\Desktop\VOC2012\153.jpg</path>
<source>
    <database>Unknown</database>
</source>
<size>
    <width>1600</width>
    <height>1200</height>
    <depth>3</depth>
</size>
<segmented>0</segmented>
    <object>
    <name>spm40</name>
    <pose>Unspecified</pose>
    <truncated>0</truncated>
    <difficult>0</difficult>
    <bndbox>
        <xmin>859</xmin>
        <ymin>340</ymin>
        <xmax>936</xmax>
        <ymax>416</ymax>
    </bndbox>
</object>

xml File

hinkeret commented 6 years ago

如果方便的话,能否麻烦您提供一个极小的训练数据集,我用来验证是环境问题还是训练数据出了差错?

yangxue0827 commented 6 years ago

你可以使用voc2007的数据集。

hinkeret commented 6 years ago

你好,我找到这个错误的原因了,是因为在使用tf.decode_raw解析照片的时候格式需要和在写照片时保持一致,我将tf.float32改成tf.uint8代码就可以正常运行了

frothmoon commented 6 years ago

May I ask which file you modified? @hinkeret

hinkeret commented 6 years ago

The file is read_tfrecord.py, you can modify “img = tf.decode_raw(features['img'],tf.float32)”,change tf.float32 into tf.uint8. @frothmoon

frothmoon commented 6 years ago

抱歉,刚刚没有看清楚,我下载的read_tfrecord.py文件中原来的代码就是img = tf.decode_raw(features['img'],tf.uint8),我的代码还是没有正常运行,还是报错PaddingFIFOQueue,请问有解决办法吗? 非常感谢! @hinkeret

hinkeret commented 6 years ago

上面作者说的那些问题你都检查过吗?你的环境是Windows还是Linux @frothmoon

frothmoon commented 6 years ago

检查过了,是linux 请问正常运行时的输出是下面这样的吗

2018-08-31 16:30:08.592256: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2651 get requests, put_count=2134 evicted_count=1000 eviction_rate=0.468604 and unsatisfied allocation rate=0.609959
2018-08-31 16:30:08.592313: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
2018-08-31 16:30:04: step1 image_name:b'Cow_600.jpg' |
rpn_loc_loss:1.1323292255401611 | rpn_cla_loss:1.5864677429199219 | rpn_total_loss:2.718796968460083 |
fast_rcnn_loc_loss:0.14092016220092773 | fast_rcnn_cla_loss:0.5938623547554016 | fast_rcnn_total_loss:0.7347825169563293 |
total_loss:4.092337608337402 | pre_cost_time:4.808664560317993s

@hinkeret

hinkeret commented 6 years ago

是的 @frothmoon

frothmoon commented 6 years ago

好奇怪啊,我这里都已经跑了几张图片后报的错

Traceback (most recent call last):
  File "train.py", line 229, in <module>
    train()
  File "train.py", line 196, in train
    fast_rcnn_total_loss, total_loss, train_op])
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

Caused by op 'get_batch/batch', defined at:
  File "train.py", line 229, in <module>
    train()
  File "train.py", line 34, in train
    is_training=True)
  File "../data/io/read_tfrecord.py", line 89, in next_batch
    dynamic_pad=True)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 919, in batch
    name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 716, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 457, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 946, in _queue_dequeue_many_v2
    timeout_ms=timeout_ms, name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

@hinkeret

hinkeret commented 6 years ago

那应该是你的数据有问题 你需要查一查照片 @frothmoon

yangxue0827 commented 5 years ago

Recommend improved code: https://github.com/DetectionTeamUCAS/FPN_Tensorflow. @hinkeret @zhangxiaopang88 @frothmoon

linden1994 commented 5 years ago

好奇怪啊,我这里都已经跑了几张图片后报的错

Traceback (most recent call last):
  File "train.py", line 229, in <module>
    train()
  File "train.py", line 196, in train
    fast_rcnn_total_loss, total_loss, train_op])
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

Caused by op 'get_batch/batch', defined at:
  File "train.py", line 229, in <module>
    train()
  File "train.py", line 34, in train
    is_training=True)
  File "../data/io/read_tfrecord.py", line 89, in next_batch
    dynamic_pad=True)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 919, in batch
    name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 716, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 457, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 946, in _queue_dequeue_many_v2
    timeout_ms=timeout_ms, name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

@hinkeret

好奇怪啊,我这里都已经跑了几张图片后报的错

Traceback (most recent call last):
  File "train.py", line 229, in <module>
    train()
  File "train.py", line 196, in train
    fast_rcnn_total_loss, total_loss, train_op])
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

Caused by op 'get_batch/batch', defined at:
  File "train.py", line 229, in <module>
    train()
  File "train.py", line 34, in train
    is_training=True)
  File "../data/io/read_tfrecord.py", line 89, in next_batch
    dynamic_pad=True)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 919, in batch
    name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 716, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 457, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 946, in _queue_dequeue_many_v2
    timeout_ms=timeout_ms, name=name)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/dongpeijie/miniconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

@hinkeret

我也出现了这个问题,最后你是怎么解决的呢

yangxue0827 commented 5 years ago

应该是部分数据有问题,导致tfrecord读数据出错 @linden1994

leidaguo commented 5 years ago

do not use encode() in linux; 'img_name': _bytes_feature(img_name.encode()) use 'img_name': _bytes_feature(img_name)

hu5tao commented 5 years ago

应该是这个问题,在运行convert_data_to_tfrecord.py时候,--dataset=后面的名字不能随便命名,训练自己的数据情况: 检查: 1.read_tfrecord.py中,68行,if dataset_name not in后面有这个文件名称;比如自己设定为voc_shelf; 2.libs/label_name_dict/label_dict.py中设定自己voc_shelf,并且在里面写入自己类别名称; 3.在libs/configs/cfgs.py中指定DATASET_NAME = 'voc_self' 4.python convert_data_to_tfrecord.py --VOC_dir='../VOC_NWPU/VOCdevkit_train/' --save_name='train' --img_format='.jpg' --dataset='voc_self'即可,训练的时候就不会出错了.

fall-love commented 4 years ago

@frothmoon 您好,请问您当时是怎么解决PaddingFIFOQueue问题的?还记得吗?

SwimmingLiu commented 1 year ago

do not use encode() in linux; 'img_name': _bytes_feature(img_name.encode()) use 'img_name': _bytes_feature(img_name)

it caused this problem :

Traceback (most recent call last): File "/home/user1/R2CNN-Plus-Plus_Tensorflow/data/io/convert_data_to_tfrecord.py", line 123, in convert_pascal_to_tfrecord() File "/home/user1/R2CNN-Plus-Plus_Tensorflow/data/io/convert_data_to_tfrecord.py", line 102, in convert_pascal_to_tfrecord 'img_name': _bytes_feature(img_name), File "/home/user1/R2CNN-Plus-Plus_Tensorflow/data/io/convert_data_to_tfrecord.py", line 28, in _bytes_feature return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) TypeError: '2091.jpg' has type str, but expected one of: bytes

SwimmingLiu commented 1 year ago

你好,造成PaddingFIFOQueue的原因一般是你的tfrecord生成错误,首先你可以单独测试一下能不能从tfrecord读出数据,如果不行检查一下tfrecord的路径,还有convert data to tfrecord 文件中有个地方改一下(见图片,两种都尝试一下),重新生成tfrecord。如果还没解决我们再讨论,如果解决了,请告诉我是什么原因,谢谢。 PS. 你发我的邮件我回复不了,已经退信两次了。 image

按照这个方法改完之后,不能运行了(环境:Linux anaconda)

Traceback (most recent call last): File "/home/user1/R2CNN-Plus-Plus_Tensorflow/data/io/convert_data_to_tfrecord.py", line 123, in convert_pascal_to_tfrecord() File "/home/user1/R2CNN-Plus-Plus_Tensorflow/data/io/convert_data_to_tfrecord.py", line 102, in convert_pascal_to_tfrecord 'img_name': _bytes_feature(img_name), File "/home/user1/R2CNN-Plus-Plus_Tensorflow/data/io/convert_data_to_tfrecord.py", line 28, in _bytes_feature return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) TypeError: '2091.jpg' has type str, but expected one of: bytes