hellochick / PSPNet-tensorflow

TensorFlow-based implementation of "Pyramid Scene Parsing Network".
326 stars 123 forks source link

Training crashes on Cityscapes #34

Open kshitijagrwl opened 6 years ago

kshitijagrwl commented 6 years ago

I'm training PSPNet using the train.py script provided - i've tried running it on GTX1080 and TitanX . It always crashes after about 500 steps. Log below:

step 590         loss = 0.266, (0.723 sec/step)
Traceback (most recent call last):
  File "train.py", line 219, in <module>
    main()
  File "train.py", line 210, in main
    loss_value, _ = sess.run([reduced_loss, train_op], feed_dict=feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op u'create_inputs/batch', defined at:
  File "train.py", line 219, in <module>
    main()
  File "train.py", line 121, in main
    image_batch, label_batch = reader.dequeue(args.batch_size)
  File "/home/ml/codes/PSPNet-tensorflow/image_reader.py", line 116, in dequeue
    num_elements)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 927, in batch
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 722, in _batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 464, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2418, in _queue_dequeue_many_v2
    component_types=component_types, timeout_ms=timeout_ms, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
         [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]
strand2013 commented 5 years ago

Maybe you need to check your feed data at 500 steps, I guess the size is not same as before so that the crashes occur.

narendoraiswamy commented 5 years ago

I was strained with the same error too but however after a close at the error and debugging, i found out the solution. Check the cityscapes_train_list.txt file in the list folder and make sure that you do not have any extra/empty lines. It is basically trying to take the empty line as an input but not able to find the required image since it is not mentioned. Hence the error "has insufficient elements (requested 1, current size 0)". It is a simple logical error and a code limitation. @RaceSu @kshitijagrwl