warmspringwinds / tf-image-segmentation

Image Segmentation framework based on Tensorflow and TF-Slim library
MIT License
549 stars 188 forks source link

OutOfRangeError: RandomShuffleQueue '_1_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 1, current size 0) #10

Open FredHaa opened 7 years ago

FredHaa commented 7 years ago

Hello,

I am trying to use the framework to segment images of bacteria.

I am using the provided recipe for FCN_32s, but with a few adaptations for my custom data set (different lut, changed image size and number of classes)

The entire script looks like this:

import tensorflow as tf
import tensorflow.contrib.slim as slim
import numpy as np
import skimage.io as io
import os, sys
from matplotlib import pyplot as plt

root_dir = '/home/frederik/Documents/Uni/semester_8/AI4/DeepBacteriaSegmentation/'
sys.path.append(root_dir + 'models/slim/')
sys.path.append(root_dir + 'tf-image-segmentation/')

from tf_image_segmentation.models.fcn_32s import FCN_32s, extract_vgg_16_mapping_without_fc8
from tf_image_segmentation.utils.tf_records import read_tfrecord_and_decode_into_image_annotation_pair_tensors
from tf_image_segmentation.utils.training import get_valid_logits_and_labels
from tf_image_segmentation.utils.augmentation import flip_randomly_left_right_image_with_annotation, scale_randomly_image_with_annotation_with_fixed_size_output
from tf_image_segmentation.utils.look_up_tables import alive_and_dead_cell_lut

checkpoints_dir = root_dir + 'checkpoints/'
log_folder = root_dir + 'log_folder/'
vgg_checkpoint_path = checkpoints_dir + 'vgg_16.ckpt'

image_train_size = [704, 320]
number_of_classes = 3

tfrecord_filename = 'bacteria.tfrecords'

cell_lut = alive_and_dead_cell_lut()
class_labels = cell_lut.keys()

filename_queue = tf.train.string_input_producer(
    [tfrecord_filename], num_epochs=10)

image, annotation = read_tfrecord_and_decode_into_image_annotation_pair_tensors(filename_queue)

resized_image, resized_annotation = scale_randomly_image_with_annotation_with_fixed_size_output(image, annotation, image_train_size)

resized_annotation = tf.squeeze(resized_annotation)

image_batch, annotation_batch = tf.train.shuffle_batch( [resized_image, resized_annotation],
                                             batch_size=1,
                                             capacity=3000,
                                             num_threads=2,
                                             min_after_dequeue=1000)

upsampled_logits_batch, vgg_16_variables_mapping = FCN_32s(image_batch_tensor=image_batch,
                                                           number_of_classes=number_of_classes,
                                                           is_training=True)

valid_labels_batch_tensor, valid_logits_batch_tensor = get_valid_logits_and_labels(annotation_batch_tensor=annotation_batch,
                                                                                     logits_batch_tensor=upsampled_logits_batch,
                                                                                    class_labels=class_labels)

cross_entropies = tf.nn.softmax_cross_entropy_with_logits(logits=valid_logits_batch_tensor,
                                                          labels=valid_labels_batch_tensor)

# Normalize the cross entropy -- the number of elements
# is different during each step due to mask out regions
cross_entropy_sum = tf.reduce_mean(cross_entropies)

pred = tf.argmax(upsampled_logits_batch, dimension=3)

probabilities = tf.nn.softmax(upsampled_logits_batch)

with tf.variable_scope("adam_vars"):
    train_step = tf.train.AdamOptimizer(learning_rate=0.000001).minimize(cross_entropy_sum)

# Variable's initialization functions
vgg_16_without_fc8_variables_mapping = extract_vgg_16_mapping_without_fc8(vgg_16_variables_mapping)

init_fn = slim.assign_from_checkpoint_fn(model_path=vgg_checkpoint_path,
                                         var_list=vgg_16_without_fc8_variables_mapping)

global_vars_init_op = tf.global_variables_initializer()

tf.summary.scalar('cross_entropy_loss', cross_entropy_sum)

merged_summary_op = tf.summary.merge_all()

summary_string_writer = tf.summary.FileWriter(log_folder)

# Create the log folder if doesn't exist yet
if not os.path.exists(log_folder):
     os.makedirs(log_folder)

#The op for initializing the variables.
local_vars_init_op = tf.local_variables_initializer()

combined_op = tf.group(local_vars_init_op, global_vars_init_op)

# We need this to save only model variables and omit
# optimization-related and other variables.
model_variables = slim.get_model_variables()
saver = tf.train.Saver(model_variables)

with tf.Session()  as sess:

    sess.run(combined_op)
    init_fn(sess)

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    # 10 epochs
    for i in xrange(11127 * 10):

        cross_entropy, summary_string, _ = sess.run([ cross_entropy_sum,
                                                      merged_summary_op,
                                                      train_step ])

        print("Current loss: " + str(cross_entropy))

        summary_string_writer.add_summary(summary_string, i)

        if i % 11127 == 0:
            save_path = saver.save(sess, checkpoints_dir + "model_fcn32s_epoch_" + str(i / 11127) + ".ckpt")
            print("Model saved in file: %s" % save_path)

    coord.request_stop()
    coord.join(threads)

    save_path = saver.save(sess, checkpoints_dir + "model_fcn32s_final.ckpt")
    print("Model saved in file: %s" % save_path)

summary_string_writer.close()`

When i run the script i get the following error:

Traceback (most recent call last):
  File "/home/frederik/Documents/Uni/semester_8/AI4/DeepBacteriaSegmentation/fcn_32s_train.py", line 111, in <module>
    train_step ])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 1, current size 0)
     [[Node: shuffle_batch = QueueDequeueMany[_class=["loc:@shuffle_batch/random_shuffle_queue"], component_types=[DT_UINT8, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

Caused by op u'shuffle_batch', defined at:
  File "/home/frederik/Documents/Uni/semester_8/AI4/DeepBacteriaSegmentation/fcn_32s_train.py", line 43, in <module>
    min_after_dequeue=1000)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 917, in shuffle_batch
    dequeued = queue.dequeue_many(batch_size, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
    self._queue_ref, n=n, component_types=self._dtypes, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1099, in _queue_dequeue_many
    timeout_ms=timeout_ms, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 1, current size 0)
     [[Node: shuffle_batch = QueueDequeueMany[_class=["loc:@shuffle_batch/random_shuffle_queue"], component_types=[DT_UINT8, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

bacteria.tfrecords is a file of 11127 image/annotation pairs (copies of the same image), created using

from tf_image_segmentation.utils.tf_records import write_image_annotation_pairs_to_tfrecord

Do you have any idea of what might be wrong?

FredHaa commented 7 years ago

This happens at the first train step, so it appears that the queue is never filled

FredHaa commented 7 years ago

I think I have narrowed the problem down to how the tfrecord is made. I create the file using the following script:

import tensorflow as tf
import tensorflow.contrib.slim as slim
import numpy as np
import skimage.io as io
import os, sys
from os import walk

root_dir = "/home/frederik/Documents/Uni/semester_8/AI4/DeepBacteriaSegmentation/"

# Add a path to a custom fork of TF-Slim
# Get it from here:
# https://github.com/warmspringwinds/models/tree/fully_conv_vgg
sys.path.append(root_dir + "models/slim/")

# Add path to the cloned library
sys.path.append(root_dir + "tf-image-segmentation/")

from tf_image_segmentation.utils.tf_records import write_image_annotation_pairs_to_tfrecord, read_image_annotation_pairs_from_tfrecord

img_path = []
annotation_path = []
for (dirpath, dirnames, filenames) in walk(root_dir + "annotated/"):
    for image in filenames:
        if image[-3:] == "jpg":
            img_path.append(dirpath + image)
        elif image[-3:] == "png":
            annotation_path.append(dirpath + image)
    break

file_pairs = []
if len(img_path) == len(annotation_path):
    for i in range(0, len(img_path)):
        file_pairs.append((img_path[i], annotation_path[i]))

write_image_annotation_pairs_to_tfrecord(file_pairs, "bacteria.tfrecords")

pairs = read_image_annotation_pairs_from_tfrecord("bacteria.tfrecords")

But there seem to be some inconsistencies:

read_image_annotation_pairs_from_tfrecord expects the annotation image to only have 1 channel annotation = annotation_1d.reshape((height, width)) in tf_records.py

However, the FCN_32s model require that the annotations are of the same shape as the logits, which have 3 channels.

read_image_annotation_pairs_from_tfrecord can be fixed by changing the line to annotation = annotation_1d.reshape((height, width, 3)) assuming that 3 channel annotations is the correct behavior.

Regarding the original issue, I assume that I am using write_image_annotation_pairs_to_tfrecord correctly?

vj-1988 commented 7 years ago

I am also facing the same issue while training the VOC dataset. Any updates on this error?

jhjang commented 7 years ago

I got this error, too. Did you get any solution?

vaklyuenkov commented 7 years ago

So, I have the same error on FCN_8s. Any ideas?

ahundt commented 7 years ago

@FrederikHaa Can you create a pull request with these fixes?

ghost commented 7 years ago

This training code assumes the number of training samples = 11127. If at all the training sample is different from this default value, you need to change it accordingly. I also faced the same issue, because my custom dataset contains less training sample. After doing this fix the code is working fine.

jhjang commented 6 years ago

@nirmaljith Can you explain how to change the number of training samples? I can't find the variable to change the numbers.

ghost commented 6 years ago

@jhjang Its not assigned to any variable in the code, so may find it difficult to figure out. In code tf-image-segmentation/tf_image_segmentation/recipes/pascal_voc/FCNs/fcn_32s_train.ipynb you have to change the value in xrange. The original code assumes training samples to be 11127

for i in xrange(11127 * 10):

jhjang commented 6 years ago

@nirmaljith I got it. Thank for your helping :)

vinayakarannil commented 6 years ago

I am still facing this issue...any solutions? i am running training script for my own dataset. Training script runs successfully for the same number of training samples of pascal voc, but not for my dataset. So its not the issue of number of training samples in my case

vinayakarannil commented 6 years ago

i followed the comment by @ahundt written in "tfrecords should also include depth and format #13" and now my problem is solved.

deepk91 commented 6 years ago

@vinayakkailas I am also running the same script but for my own dataset which has very less images around 250. How did you solve this error? What value of depth and format you used? It would be great if you can share this.

kheffah commented 6 years ago

I'm facing the same error, and there doesn't seem to be an available answer. Can anyone help? Thanks!

kheffah commented 6 years ago

Got it! The dataset was corrupted as numpy decided to expand the dimensions of the image (M,N) -> (M,N,1) when I passed a slice of the image to another method rather than defining a separate np array. Hope this helps others facing the same issue.

bohelion commented 6 years ago

@kheffah Can you say more in detail? how to do?I am a beginner, it would be greatful if you can share this.

kheffah commented 6 years ago

@bohelion Sure. Actually the error was not from numpy, but from scipy.misc. In my case, I was reading the label mask with scipy.misc, but forgot to specify the mode='I' parameter, which resulted in my label having a 3rd dim (height, width, extra). So, when it was saved to the .tfrecords file, it had the wrong dimensions and did not fit the pre-specified dimensions of my TF graph. Hope this helps. #Read image im = scipy.misc.imread(impath, mode='RGB') #Read label
lbl = scipy.misc.imread(lblpath, mode='L')

DiyuanLu commented 5 years ago

in your code, change num_epochs to a larger number would solve the problem. I had the same problem and this worked fine for me. filename_queue = tf.train.string_input_producer( [tfrecord_filename], num_epochs=10)

dhKwang commented 5 years ago

maybe you should check your file name,try changing it to absolute path.

cdcky commented 5 years ago

@kheffah ,Thank you!!!!!! repect from China,you save my life. 谢谢~~