Open yangxue0827 opened 6 years ago
Hello, I'm confused to the problem as follow: When I run the commend: python train.py
restore model
2018-09-02 13:56:19.347219: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2601 get requests, put_count=2077 evicted_count=1000 eviction_rate=0.481464 and unsatisfied allocation rate=0.624375
2018-09-02 13:56:19.347282: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
2018-09-02 13:56:14: step1 image_name:b'Cow_600.jpg' |
rpn_loc_loss:0.917148232460022 | rpn_cla_loss:1.6249786615371704 | rpn_total_loss:2.5421268939971924 |
fast_rcnn_loc_loss:0.03855091333389282 | fast_rcnn_cla_loss:1.3284811973571777 | fast_rcnn_total_loss:1.3670320510864258 |
total_loss:4.547943592071533 | pre_cost_time:5.217535734176636s
2018-09-02 13:56:32.363722: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1798 get requests, put_count=2434 evicted_count=2000 eviction_rate=0.821693 and unsatisfied allocation rate=0.768076
2018-09-02 13:56:32.363782: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 193 to 212
2018-09-02 13:56:34: step11 image_name:b'Cow_169.jpg' |
rpn_loc_loss:1.169303297996521 | rpn_cla_loss:1.1856460571289062 | rpn_total_loss:2.354949474334717 |
fast_rcnn_loc_loss:0.1402481347322464 | fast_rcnn_cla_loss:0.6426894664764404 | fast_rcnn_total_loss:0.7829375863075256 |
total_loss:3.776689291000366 | pre_cost_time:0.773120641708374s
2018-09-02 13:56:38.791215: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1800 get requests, put_count=1673 evicted_count=1000 eviction_rate=0.597729 and unsatisfied allocation rate=0.646667
2018-09-02 13:56:38.791280: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 409 to 449
2018-09-02 13:56:42: step21 image_name:b'Cow_471.jpg' |
rpn_loc_loss:0.3570420444011688 | rpn_cla_loss:0.20661428570747375 | rpn_total_loss:0.5636563301086426 |
fast_rcnn_loc_loss:0.0 | fast_rcnn_cla_loss:0.00026819398044608533 | fast_rcnn_total_loss:0.00026819398044608533 |
total_loss:1.2027227878570557 | pre_cost_time:0.7935807704925537s
2018-09-02 13:56:49.491758: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3583 get requests, put_count=3653 evicted_count=1000 eviction_rate=0.273748 and unsatisfied allocation rate=0.286073
2018-09-02 13:56:49.491816: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 1053 to 1158
2018-09-02 13:56:50: step31 image_name:b'Cow_882.jpg' |
rpn_loc_loss:0.5051007270812988 | rpn_cla_loss:0.255696177482605 | rpn_total_loss:0.7607969045639038 |
fast_rcnn_loc_loss:0.04570673406124115 | fast_rcnn_cla_loss:0.270821750164032 | fast_rcnn_total_loss:0.3165284991264343 |
total_loss:1.7161149978637695 | pre_cost_time:0.8400037288665771s
2018-09-02 13:56:52.131273: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
And it will repeat the W showed in the up again and again, at last,there is the problem:
OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
Why it will be worry after step 31, and I inference that's not the problem of path.Besides, I think it generate right tfrecord, or it can't run step until 31. How can I solve it? Thank you in advance.
Hello, I'm confused to the problem as follow: When I run the commend: python train.py
restore model 2018-09-02 13:56:19.347219: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 2601 get requests, put_count=2077 evicted_count=1000 eviction_rate=0.481464 and unsatisfied allocation rate=0.624375 2018-09-02 13:56:19.347282: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110 2018-09-02 13:56:14: step1 image_name:b'Cow_600.jpg' | rpn_loc_loss:0.917148232460022 | rpn_cla_loss:1.6249786615371704 | rpn_total_loss:2.5421268939971924 | fast_rcnn_loc_loss:0.03855091333389282 | fast_rcnn_cla_loss:1.3284811973571777 | fast_rcnn_total_loss:1.3670320510864258 | total_loss:4.547943592071533 | pre_cost_time:5.217535734176636s 2018-09-02 13:56:32.363722: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1798 get requests, put_count=2434 evicted_count=2000 eviction_rate=0.821693 and unsatisfied allocation rate=0.768076 2018-09-02 13:56:32.363782: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 193 to 212 2018-09-02 13:56:34: step11 image_name:b'Cow_169.jpg' | rpn_loc_loss:1.169303297996521 | rpn_cla_loss:1.1856460571289062 | rpn_total_loss:2.354949474334717 | fast_rcnn_loc_loss:0.1402481347322464 | fast_rcnn_cla_loss:0.6426894664764404 | fast_rcnn_total_loss:0.7829375863075256 | total_loss:3.776689291000366 | pre_cost_time:0.773120641708374s 2018-09-02 13:56:38.791215: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 1800 get requests, put_count=1673 evicted_count=1000 eviction_rate=0.597729 and unsatisfied allocation rate=0.646667 2018-09-02 13:56:38.791280: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 409 to 449 2018-09-02 13:56:42: step21 image_name:b'Cow_471.jpg' | rpn_loc_loss:0.3570420444011688 | rpn_cla_loss:0.20661428570747375 | rpn_total_loss:0.5636563301086426 | fast_rcnn_loc_loss:0.0 | fast_rcnn_cla_loss:0.00026819398044608533 | fast_rcnn_total_loss:0.00026819398044608533 | total_loss:1.2027227878570557 | pre_cost_time:0.7935807704925537s 2018-09-02 13:56:49.491758: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 3583 get requests, put_count=3653 evicted_count=1000 eviction_rate=0.273748 and unsatisfied allocation rate=0.286073 2018-09-02 13:56:49.491816: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 1053 to 1158 2018-09-02 13:56:50: step31 image_name:b'Cow_882.jpg' | rpn_loc_loss:0.5051007270812988 | rpn_cla_loss:0.255696177482605 | rpn_total_loss:0.7607969045639038 | fast_rcnn_loc_loss:0.04570673406124115 | fast_rcnn_cla_loss:0.270821750164032 | fast_rcnn_total_loss:0.3165284991264343 | total_loss:1.7161149978637695 | pre_cost_time:0.8400037288665771s 2018-09-02 13:56:52.131273: W tensorflow/core/framework/op_kernel.cc:1158] Out of range: PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
And it will repeat the W showed in the up again and again, at last,there is the problem:
OutOfRangeError (see above for traceback): PaddingFIFOQueue '_2_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
Why it will be worry after step 31, and I inference that's not the problem of path.Besides, I think it generate right tfrecord, or it can't run step until 31. How can I solve it? Thank you in advance.
same problem did u save it?
do not use encode() in linux; 'img_name': _bytes_feature(img_name.encode()) use 'img_name': _bytes_feature(img_name)
@zhaonann did u save it? plase help me.
The reason for the PaddingFIFOQueue is that your generate wrong tfrecord: (1) First you can test separately whether you can read data from tfrecord; (2) If you can't read it, please check the path of tfrecord (read_tfrecord.py). (3) In addition, There is a place in the picture that may need to be changed, depending on the different environment.