Error with using a custom dataset and image_util.ImageDataProvider()

JackBurdick commented 7 years ago

Hey!

Thank you for your help and for the updates. Unfortunately, we're still having trouble using a custom dataset and we're hoping you can help us.

The main error produced is this;

....
tensorflow.python.framework.errors.InvalidArgumentError: logits and labels must be same size: logits_size=[709520,2] labels_size=[710500,2]
     [[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_29, Reshape_30)]]
....

We're working with rgb images of size (767x1022) and using the data provider function like this (where everything here seems to be working as expected); data_provider = image_util.ImageDataProvider('reduced_segmentation_dataset/*', data_suffix=".jpg", mask_suffix='_mask.png')

The other thing we noticed is that when we call path = trainer.train(data_provider, "./skin_trained", training_iters=10, epochs=4, display_step=2) it defaults to calling test_x, test_y = data_provider(4) on line 367 of master/tf_unet/unet.py If we call data_provider(1) we seem to get the results we expect and bypass the error but the above error is still preventing us from a full run.

Do you have any ideas why we're having this mismatch? I'd be happy to provide more information as needed.

jakeret commented 7 years ago

Based on the information I have to guess a bit. Are you sure that the data and the mask image have the same dimension?

What is the error you are getting for "the other thing"? For testing purpose: can you create an instance of the ImageDataProvider as posted above and call test_x, test_y = data_provider(4). What are the shapes of test_x and test_y?

JackBurdick commented 7 years ago

We double checked and all images are the same dimensions.

If we run test_x, test_y = data_provider(4) we get the following;

>>> print(x_test.shape)
(4, 767, 1022, 3)
>>> print(y_test.shape)
(4, 767, 1022, 2)

JackBurdick commented 7 years ago

We assumed the line data_provider(4) on line 367 of master/tf_unet/unet.py was left over from the toy demo.. We can revert back and see what the error is if needed though

jakeret commented 7 years ago

No this line is actually important. Thats where I load the dataset used for the verification. I just checked in an update where I replaced the magic-number with a constant to make it more explicit

JackBurdick commented 7 years ago

Ok.. We've reverted back/updated, thank you. If we follow this example where should data come from? I think our logic must be wrong.

jakeret commented 7 years ago

Thats the data you want to run the prediction/test on.

JackBurdick commented 7 years ago

Right, ok so if we try this (not sure we're using it correctly):

data_provider = image_util.ImageDataProvider('reduced_segmentation_dataset/*', data_suffix=".jpg", mask_suffix='_Segmentation.png')

net = unet.Unet(channels=3, n_class=2, layers=3, features_root=64)
trainer = unet.Trainer(net, optimizer="momentum", opt_kwargs=dict(momentum=0.2))

path = trainer.train(data_provider, output_path, training_iters=16, epochs=4)

prediction = net.predict(path, data_provider)

we get this output:

Number of files used: 14
Layers 3, features 64, filter size 3x3, pool size: 2x2
Removing '/Users/MrBurdick/Documents/test_unet/prediction'
Removing '/Users/MrBurdick/Documents/test_unet/jack_trained'
Allocating '/Users/MrBurdick/Documents/test_unet/prediction'
Allocating '/Users/MrBurdick/Documents/test_unet/jack_trained'
Traceback (most recent call last):
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _do_call
    return fn(*args)
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 947, in _run_fn
    status, run_metadata)
  File "/Users/MrBurdick/anaconda/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/errors.py", line 450, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors.InvalidArgumentError: logits and labels must be same size: logits_size=[2838080,2] labels_size=[2842000,2]
     [[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_29, Reshape_30)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "skin2.py", line 13, in <module>
    path = trainer.train(data_provider, output_path, training_iters=16, epochs=4)
  File "/Users/MrBurdick/.local/lib/python3.5/site-packages/tf_unet-0.1.0-py3.5.egg/tf_unet/unet.py", line 368, in train
    pred_shape = self.store_prediction(sess, test_x, test_y, "_init")
  File "/Users/MrBurdick/.local/lib/python3.5/site-packages/tf_unet-0.1.0-py3.5.egg/tf_unet/unet.py", line 415, in store_prediction
    self.net.keep_prob: 1.})
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 710, in run
    run_metadata_ptr)
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 908, in _run
    feed_dict_string, options, run_metadata)
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 958, in _do_run
    target_list, options, run_metadata)
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 978, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: logits and labels must be same size: logits_size=[2838080,2] labels_size=[2842000,2]
     [[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_29, Reshape_30)]]
Caused by op 'SoftmaxCrossEntropyWithLogits', defined at:
  File "skin2.py", line 10, in <module>
    net = unet.Unet(channels=3, n_class=2, layers=3, features_root=64)
  File "/Users/MrBurdick/.local/lib/python3.5/site-packages/tf_unet-0.1.0-py3.5.egg/tf_unet/unet.py", line 198, in __init__
    tf.reshape(self.y, [-1, n_class])))
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 491, in softmax_cross_entropy_with_logits
    precise_logits, labels, name=name)
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1427, in _softmax_cross_entropy_with_logits
    features=features, labels=labels, name=name)
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
    op_def=op_def)
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2317, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/MrBurdick/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1239, in __init__
    self._traceback = _extract_stack()

JackBurdick commented 7 years ago

we also tried:

path = trainer.train(data_provider, output_path, training_iters=16, epochs=4)

data, _ = data_provider(1)

prediction = net.predict(path, data)

but were met with what looks like the same error

jakeret commented 7 years ago

Hard to tell from this. Can you provide the code and the data so that it can be reproduced?

Apart from that, this line prediction = net.predict(path, data_provider) is not going to work

jakeret commented 7 years ago

Seems like that there is a bug in the util.crop_to_shape function when the image size contains an odd number.

One way around this in the meantime it to resize your input:

from PIL import Image
import numpy as np

class SkinImageDataProvider(image_util.ImageDataProvider):

    def _load_file(self, path, dtype=np.float32):
        img = Image.open(path)
        return np.array(img.resize((500, 376)), dtype)

Your will of course lose information but the learning process is going to be much faster

gingerly commented 7 years ago

Hi, Thanks for pointing out the bug. I got a different error after using your work around:

/DLRecon/tf_unet-master/tf_unet/util.py in combine_img_prediction(data, gt, pred)
    101     ch = data.shape[3]
    102     img = np.concatenate((to_rgb(crop_to_shape(data, pred.shape).reshape(-1, ny, ch)),
--> 103                           to_rgb(crop_to_shape(gt[..., 1], pred.shape).reshape(-1, ny, 1)),
    104                           to_rgb(pred[..., 1].reshape(-1, ny, 1))), axis=1)
    105     return img

IndexError: index 1 is out of bounds for axis 3 with size 1

jakeret commented 7 years ago

Do you have a stacktrace? When does the error happen?

What is the shape of your ground truth?

moudsarewju commented 7 years ago

Hello, same bug when I use my own dataset

generator = image_util.ImageDataProvider("train/*.png")
net = unet.Unet(channels=generator.channels, n_class=generator.n_class, layers=1, features_root=16)
trainer = unet.Trainer(net, optimizer="momentum", opt_kwargs=dict(momentum=0.2))
path = trainer.train(generator, "./unet_trained", training_iters=20, epochs=100)
prediction = net.predict("./unet_trained/model.cpkt", x_test)

2017-08-10 01:12:23,718 Verification error= 96.5%, loss= 0.7201
2017-08-10 01:12:24,015 Start optimization
2017-08-10 01:12:24,406 Iter 0, Minibatch Loss= 0.4491, Training Accuracy= 1.0000, Minibatch error= 0.0%
2017-08-10 01:12:24,646 Iter 1, Minibatch Loss= 0.2825, Training Accuracy= 0.9981, Minibatch error= 0.2%
2017-08-10 01:12:24,875 Iter 2, Minibatch Loss= 0.1428, Training Accuracy= 0.9958, Minibatch error= 0.4%
2017-08-10 01:12:25,086 Iter 3, Minibatch Loss= 0.0630, Training Accuracy= 0.9999, Minibatch error= 0.0%
2017-08-10 01:12:25,276 Iter 4, Minibatch Loss= 0.0599, Training Accuracy= 0.9886, Minibatch error= 1.1%
2017-08-10 01:12:25,468 Iter 5, Minibatch Loss= 0.0198, Training Accuracy= 0.9999, Minibatch error= 0.0%
2017-08-10 01:12:25,668 Iter 6, Minibatch Loss= 0.0147, Training Accuracy= 1.0000, Minibatch error= 0.0%
Traceback (most recent call last):
  File "demo_code.py", line 64, in <module>
    path = trainer.train(generator, "./unet_trained", training_iters=20, epochs=100)
  File "/media/hd01/unet/ss/tf_unet/unet.py", line 435, in train
    self.net.keep_prob: dropout})
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits and labels must be same size: logits_size=[64516,2] labels_size=[0,2]
         [[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape_7, Reshape_8)]]

Caused by op u'SoftmaxCrossEntropyWithLogits', defined at:
  File "demo_code.py", line 60, in <module>
    net = unet.Unet(channels=generator.channels, n_class=generator.n_class, layers=1, features_root=16)
  File "/media/hd01/unet/ss/tf_unet/unet.py", line 193, in __init__
    self.cost = self._get_cost(logits, cost, cost_kwargs)
  File "/media/hd01/unet/ss/tf_unet/unet.py", line 243, in _get_cost
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=flat_logits, labels=flat_labels))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py", line 1594, in softmax_cross_entropy_with_logits
    precise_logits, labels, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 2380, in _softmax_cross_entropy_with_logits
    features=features, labels=labels, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): logits and labels must be same size: logits_size=[64516,2] labels_size=[0,2]
         [[Node: SoftmaxCrossEntropyWithLogits = SoftmaxCrossEntropyWithLogits[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape_7, Reshape_8)]]```

jakeret / tf_unet

Error with using a custom dataset and image_util.ImageDataProvider() #6