yangxue0827 / FPN_Tensorflow

A Tensorflow implementation of FPN detection framework.
415 stars 150 forks source link

occurs errors when i execute train.py #30

Open chanyixialex opened 6 years ago

chanyixialex commented 6 years ago

2018-07-22 19:19:07.851388: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at matching_files_op.cc:49 : Not found: ../data/tfrecords; No such file or directory 2018-07-22 19:19:07.851869: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at matching_files_op.cc:49 : Not found: ../data/tfrecords; No such file or directory Traceback (most recent call last): File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: ../data/tfrecords; No such file or directory [[Node: get_batch/matching_filenames/MatchingFiles = MatchingFiles_device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./tools/train.py", line 229, in train() File "./tools/train.py", line 176, in train sess.run(init_op) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1140, in _run feed_dict_tensor, options, run_metadata) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run run_metadata) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.NotFoundError: ../data/tfrecords; No such file or directory [[Node: get_batch/matching_filenames/MatchingFiles = MatchingFiles_device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'get_batch/matching_filenames/MatchingFiles', defined at: File "./tools/train.py", line 229, in train() File "./tools/train.py", line 34, in train is_training=True) File "/vol/home/alex/FPN_Tensorflow/data/io/read_tfrecord.py", line 77, in next_batch filename_tensorlist = tf.train.match_filenames_once(pattern) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 72, in match_filenames_once name=name, initial_value=io_ops.matching_files(pattern), File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 397, in matching_files "MatchingFiles", pattern=pattern, name=name) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op op_def=op_def) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): ../data/tfrecords; No such file or directory [[Node: get_batch/matching_filenames/MatchingFiles = MatchingFiles_device="/job:localhost/replica:0/task:0/device:CPU:0"]]

tip:no ../data/tfrecords directory, but I have this directory, position is correct

yangxue0827 commented 6 years ago

cd $FPN_ROOT/tools python train.py @chanyixialex

chanyixialex commented 6 years ago

@yangxue0827 the same problem, but tip is no data/tfrecords directory, i think it should path's problem.

chanyixialex commented 6 years ago

tensorflow.python.framework.errors_impl.NotFoundError: data/tfrecords; No such file or directory [[Node: get_batch/matching_filenames/MatchingFiles = MatchingFiles_device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'get_batch/matching_filenames/MatchingFiles', defined at: File "train.py", line 229, in train() File "train.py", line 34, in train is_training=True) File "../data/io/read_tfrecord.py", line 77, in next_batch filename_tensorlist = tf.train.match_filenames_once(pattern) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 72, in match_filenames_once name=name, initial_value=io_ops.matching_files(pattern), File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 397, in matching_files "MatchingFiles", pattern=pattern, name=name) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op op_def=op_def) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

NotFoundError (see above for traceback): data/tfrecords; No such file or directory [[Node: get_batch/matching_filenames/MatchingFiles = MatchingFiles_device="/job:localhost/replica:0/task:0/device:CPU:0"]]

chanyixialex commented 6 years ago

it works. it is path's problem.

chanyixialex commented 6 years ago

@yangxue0827 . it occurs next problem when i fix above. it seems tfrecord data problem. And first items data seems no problem. Thank you for your any help!

restore model 2018-07-23 20:56:06: step0 image_name:b'38bdd525-f626-4554-92ca-7ec4f95e5b2b.jpg' | rpn_loc_loss:0.2607109546661377 | rpn_cla_loss:1.163469672203064 | rpn_total_loss:1.4241806268692017 | fast_rcnn_loc_loss:0.2614363431930542 | fast_rcnn_cla_loss:0.8251640796661377 | fast_rcnn_total_loss:1.086600422859192 | total_loss:3.1513068675994873 | pre_cost_time:9.237893342971802s Traceback (most recent call last): File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call return fn(*args) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1312, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 229, in train() File "train.py", line 211, in train summary_str = sess.run(summary_op) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1140, in _run feed_dict_tensor, options, run_metadata) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run run_metadata) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

Caused by op 'get_batch/batch', defined at: File "train.py", line 229, in train() File "train.py", line 34, in train is_training=True) File "../data/io/read_tfrecord.py", line 89, in next_batch dynamic_pad=True) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 989, in batch name=name) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/training/input.py", line 763, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 483, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3476, in queue_dequeue_many_v2 component_types=component_types, timeout_ms=timeout_ms, name=name) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op op_def=op_def) File "/vol/venvs/tf1.7/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

FannierPeng commented 6 years ago

@chanyixialex Have you solved this problem? I found the image that caused the error by Binary Search method, and deleted it and its corresponding xml, then the error disappeared.

chanyixialex commented 6 years ago

@FannierPeng it may be data error, you can check corresponding xml. Or you can use other datasets to try again.

yangxue0827 commented 6 years ago

https://github.com/yangxue0827/FPN_Tensorflow/issues/36 @FannierPeng

yangxue0827 commented 5 years ago

Recommend improved code: https://github.com/DetectionTeamUCAS/FPN_Tensorflow. @chanyixialex @FannierPeng