yangxue0827 / FPN_Tensorflow

A Tensorflow implementation of FPN detection framework.
416 stars 150 forks source link

PaddingFIFOQueue #36

Open yangxue0827 opened 6 years ago

yangxue0827 commented 6 years ago

The reason for the PaddingFIFOQueue is that your generate wrong tfrecord: (1) First you can test separately whether you can read data from tfrecord; (2) If you can't read it, please check the path of tfrecord (read_tfrecord.py). (3) In addition, There is a place in the picture that may need to be changed, depending on the different environment. image

zhaonann commented 6 years ago

Hello, I'm confused to the problem as follow: When I run the commend: python train.py

 restore model
2018-09-02 13:56:19.347219: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2601 get requests, put_count=2077 evicted_count=1000 eviction_rate=0.481464 and unsatisfied allocation rate=0.624375
2018-09-02 13:56:19.347282: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
 2018-09-02 13:56:14: step1    image_name:b'Cow_600.jpg' |
                                rpn_loc_loss:0.917148232460022 |         rpn_cla_loss:1.6249786615371704 |      rpn_total_loss:2.5421268939971924 |
                                fast_rcnn_loc_loss:0.03855091333389282 |       fast_rcnn_cla_loss:1.3284811973571777 |  fast_rcnn_total_loss:1.3670320510864258 |
                                total_loss:4.547943592071533 |   pre_cost_time:5.217535734176636s
2018-09-02 13:56:32.363722: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1798 get requests, put_count=2434 evicted_count=2000 eviction_rate=0.821693 and unsatisfied allocation rate=0.768076
2018-09-02 13:56:32.363782: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 193 to 212
 2018-09-02 13:56:34: step11    image_name:b'Cow_169.jpg' |
                                rpn_loc_loss:1.169303297996521 |         rpn_cla_loss:1.1856460571289062 |      rpn_total_loss:2.354949474334717 |
                                fast_rcnn_loc_loss:0.1402481347322464 |  fast_rcnn_cla_loss:0.6426894664764404 |        fast_rcnn_total_loss:0.7829375863075256 |
                                total_loss:3.776689291000366 |   pre_cost_time:0.773120641708374s
2018-09-02 13:56:38.791215: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1800 get requests, put_count=1673 evicted_count=1000 eviction_rate=0.597729 and unsatisfied allocation rate=0.646667
2018-09-02 13:56:38.791280: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 409 to 449
 2018-09-02 13:56:42: step21    image_name:b'Cow_471.jpg' |
                                rpn_loc_loss:0.3570420444011688 |        rpn_cla_loss:0.20661428570747375 |     rpn_total_loss:0.5636563301086426 |
                                fast_rcnn_loc_loss:0.0 |         fast_rcnn_cla_loss:0.00026819398044608533 |    fast_rcnn_total_loss:0.00026819398044608533 |
                                total_loss:1.2027227878570557 |  pre_cost_time:0.7935807704925537s
2018-09-02 13:56:49.491758: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3583 get requests, put_count=3653 evicted_count=1000 eviction_rate=0.273748 and unsatisfied allocation rate=0.286073
2018-09-02 13:56:49.491816: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 1053 to 1158
 2018-09-02 13:56:50: step31    image_name:b'Cow_882.jpg' |
                                rpn_loc_loss:0.5051007270812988 |        rpn_cla_loss:0.255696177482605 |       rpn_total_loss:0.7607969045639038 |
                                fast_rcnn_loc_loss:0.04570673406124115 |       fast_rcnn_cla_loss:0.270821750164032 |   fast_rcnn_total_loss:0.3165284991264343 |
                                total_loss:1.7161149978637695 |  pre_cost_time:0.8400037288665771s
2018-09-02 13:56:52.131273: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

And it will repeat the W showed in the up again and again, at last,there is the problem:

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

Why it will be worry after step 31, and I inference that's not the problem of path.Besides, I think it generate right tfrecord, or it can't run step until 31. How can I solve it? Thank you in advance.

DODoubleJ commented 6 years ago

Hello, I'm confused to the problem as follow: When I run the commend: python train.py

 restore model
2018-09-02 13:56:19.347219: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2601 get requests, put_count=2077 evicted_count=1000 eviction_rate=0.481464 and unsatisfied allocation rate=0.624375
2018-09-02 13:56:19.347282: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
 2018-09-02 13:56:14: step1    image_name:b'Cow_600.jpg' |
                                rpn_loc_loss:0.917148232460022 |         rpn_cla_loss:1.6249786615371704 |      rpn_total_loss:2.5421268939971924 |
                                fast_rcnn_loc_loss:0.03855091333389282 |       fast_rcnn_cla_loss:1.3284811973571777 |  fast_rcnn_total_loss:1.3670320510864258 |
                                total_loss:4.547943592071533 |   pre_cost_time:5.217535734176636s
2018-09-02 13:56:32.363722: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1798 get requests, put_count=2434 evicted_count=2000 eviction_rate=0.821693 and unsatisfied allocation rate=0.768076
2018-09-02 13:56:32.363782: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 193 to 212
 2018-09-02 13:56:34: step11    image_name:b'Cow_169.jpg' |
                                rpn_loc_loss:1.169303297996521 |         rpn_cla_loss:1.1856460571289062 |      rpn_total_loss:2.354949474334717 |
                                fast_rcnn_loc_loss:0.1402481347322464 |  fast_rcnn_cla_loss:0.6426894664764404 |        fast_rcnn_total_loss:0.7829375863075256 |
                                total_loss:3.776689291000366 |   pre_cost_time:0.773120641708374s
2018-09-02 13:56:38.791215: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1800 get requests, put_count=1673 evicted_count=1000 eviction_rate=0.597729 and unsatisfied allocation rate=0.646667
2018-09-02 13:56:38.791280: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 409 to 449
 2018-09-02 13:56:42: step21    image_name:b'Cow_471.jpg' |
                                rpn_loc_loss:0.3570420444011688 |        rpn_cla_loss:0.20661428570747375 |     rpn_total_loss:0.5636563301086426 |
                                fast_rcnn_loc_loss:0.0 |         fast_rcnn_cla_loss:0.00026819398044608533 |    fast_rcnn_total_loss:0.00026819398044608533 |
                                total_loss:1.2027227878570557 |  pre_cost_time:0.7935807704925537s
2018-09-02 13:56:49.491758: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3583 get requests, put_count=3653 evicted_count=1000 eviction_rate=0.273748 and unsatisfied allocation rate=0.286073
2018-09-02 13:56:49.491816: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 1053 to 1158
 2018-09-02 13:56:50: step31    image_name:b'Cow_882.jpg' |
                                rpn_loc_loss:0.5051007270812988 |        rpn_cla_loss:0.255696177482605 |       rpn_total_loss:0.7607969045639038 |
                                fast_rcnn_loc_loss:0.04570673406124115 |       fast_rcnn_cla_loss:0.270821750164032 |   fast_rcnn_total_loss:0.3165284991264343 |
                                total_loss:1.7161149978637695 |  pre_cost_time:0.8400037288665771s
2018-09-02 13:56:52.131273: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

And it will repeat the W showed in the up again and again, at last,there is the problem:

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

Why it will be worry after step 31, and I inference that's not the problem of path.Besides, I think it generate right tfrecord, or it can't run step until 31. How can I solve it? Thank you in advance.

same problem did u save it?

leidaguo commented 5 years ago

do not use encode() in linux; 'img_name': _bytes_feature(img_name.encode()) use 'img_name': _bytes_feature(img_name)

fall-love commented 4 years ago

@zhaonann did u save it? plase help me.