keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.14k stars 19.49k forks source link

How to crop "dinamically" #4966

Closed LukeMathWalker closed 7 years ago

LukeMathWalker commented 7 years ago

I'm trying to implement the following architecture with Keras (Theano backend).

I have a first Sequential network (say S1) which takes an image as input and has 4 linear output, which do correspond to the upper-left and the bottom-right coordinates of a rectangular "window" in the input image which is supposed to contain the object I want to identify.

Once I have those four outputs... I'd like to actually crop the image!

So I thought to take again my image as input in a new Sequential network (say S2) and merge these two networks using a Merge Layer with function mode.

I turned out to be a bit more complicated than I expected and I'm stuck with some Theano errors I can't get rid of.

Here's the relevant part of the code:

model1 = Sequential()
model1.add(Convolution2D(64, 3, 3, border_mode='same', input_shape=(1280, 720, 3), activation='relu'))
# now model.output_shape == (None, 1280, 720, 64)

# add a 3x3 convolution on top, with 32 output filters:
model1.add(Convolution2D(32, 3, 3, border_mode='same',activation='relu'))
# now model.output_shape == (None, 1280, 720, 32)
model1.add(Flatten())
model1.add(Dense(4, activation='linear'))

model2 = Sequential()
# How can I create a simple "input" net?
model2.add(Reshape((1280, 720, 3), input_shape=(1280, 720, 3)))

def merger(l):
    image = l[1]
    indexes = l[0]
    index_0 = indexes[:][0]
    index_1 = indexes[:,1]
    index_2 = indexes[:,2]
    index_3 = indexes[:,3]
    cropped_image = image[:,index_0:index_1+1, index_2:index_3+1,:]
    return cropped_image

merged_model = Sequential()
merged_model.add(Merge([model1, model2], mode=merger))

And here you have the error:

 <ipython-input-25-3b844142556c> in merger(l)
      11     index_2 = indexes[:,2]
      12     index_3 = indexes[:,3]
 ---> 13     cropped_image = image[:,index_0:index_1+1, index_2:index_3+1,:]

 [...]

ValueError: ('TensorType could not be cast to have 0 dimensions', TensorType(float32, vector))

What's going on?

patyork commented 7 years ago

Just a thought, but without any kind of checks (e.g. negative check, left/right must be less than right/bottom) you can wind up with a slice like image[:, 5:2, 6:-1, :] which would wind up with a generic shape of (None, 0, 714, 3).

I don't think that has to do with your error, as the above shape with a 0 dimension is valid. But you should check out some examples of the Merge layer, especially with a custom mode, because at the very least you will also need to supply a function for output_shape since you are using a custom mode.

I'd recommend getting it working without doing the cropping first (e.g. just return the image in your merger) as you'll need to do the output_shape function as well, and then add in the actual cropping.

patyork commented 7 years ago

Actually, now that I'm thinking about it, you're going to run into issues whenever you have batch_size > 1. Tensors are essentially multidimensional matrices, so you can't have subtensors of mismatched dimensions. In other words, you can't really combine tensors of shapes (851, 254, 3) and (498, 387, 3) automatically, you would have to pad up to (851, 387, 3) and probably keep a reference list of the valid areas; so depending on what you are trying to do, it will probably just be better to keep the full images plus the diagonal corner info, and not do any cropping. But again, that's dependent on what you are trying to accomplish.

LukeMathWalker commented 7 years ago

Of course I'll have to implement checks to get a valid window (left corner to the left of the right corner, etc.) but that's not the problem here because the error I posted was raised before any attempt of training, that's why I have not supplied an output_shape function in my issue.

But actually I had not foreseen the batch_size problem... Getting around it could be complicated.

patyork commented 7 years ago

Fair enough, but I still recommend starting smaller.

That said, your error is ValueError: ('TensorType could not be cast to have 0 dimensions', TensorType(float32, vector)). This says that a float32 vector could not be cast to a 0 dimensional item. Looking at the code, and simplifying to a minimal slice attempt:

def merger(l):
    image = l[1]
    indexes = l[0]
    index_0 = indexes[:,0]
    return image[:, index_0:, :, :]

..still throws this error. We can now say that slicing by index_0 is the issue; index_0 is a slice of shape (None, 1), which is a vector. Trying to slice by a vector is nonsensical in Theano, as it expects the slicer index (index_0) to be an integer, which is a 0 dimensional item. So, the error in English is saying: "Expected an integer, got a vector of floats; cannot cast a vector of floats to an integer.

In other words, your issues are:

So, your options are:

LukeMathWalker commented 7 years ago

I understood the issue and I'm going to follow you advice. Thanks for the patience, I learned something in the process! ;D

LukeMathWalker commented 7 years ago

Anyway, just for the sake of completeness, I conceived the following workaround:

def merger(l):
     image = l[1]
     indexes = T.tensor.iround(l[0])
     index_0 = indexes[:,0]
     index_1 = indexes[:,1]
     index_2 = indexes[:,2]
     index_3 = indexes[:,3]
     nb_of_samples = T.tensor.shape(index_0)[0]
     cropped_image = T.tensor.zeros_like(image)
     for i in range(nb_of_samples):
         cropped_image = T.tensor.setsubtensor(cropped_image[i,min(index_0[i],index_1[i]):max(index_0[i],index_1[i])+1,min(index_2[i],index_3[i]):max(index_2[i],index_3[i])+1,:], image[i,min(index_0[i],index_1[i]):max(index_0[i],index_1[i])+1,min(index_2[i],index_3[i]):max(index_2[i],index_3[i])+1,:])
     return cropped_image

which almost does what I expect it to do. In fact

'TensorVariable' object cannot be interpreted as an integer

even though the index_j[i] is TensorVariable with dtype=int64 and scalar dimension.

patyork commented 7 years ago

You've switched from Symbolic computing (Theano) to python computing via the use of a for loop; python works with Integers/Floats/etc. and Theano works with Tensors. You'll need to either convert everything to integers/numpy arrays, do your loop on the CPU and convert back to Theano tensors (which will in all likelihood break the graph and autodifferentiation) or use the Theano looping functionality via scan.

nouiz commented 7 years ago

If the number of iterations of the loop is small and fixed, then you can use the for loop to build a bigger Theano graph and don't use scan.

Le 9 janv. 2017 11:48, "Pat York" notifications@github.com a écrit :

You've switched from Symbolic computing (Theano) to python computing via the use of a for loop; python works with Integers/Floats/etc. and Theano works with Tensors. You'll need to either convert everything to integers/numpy arrays, do your loop on the CPU and convert back to Theano tensors (which will in all likelihood break the graph and autodifferentiation) or use the Theano looping functionality via scan.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/4966#issuecomment-271337370, or mute the thread https://github.com/notifications/unsubscribe-auth/AALC-6kRoeZG3FbgKwCSB7sikwQeBcR2ks5rQmTEgaJpZM4Ldyfg .

bstriner commented 7 years ago

@LukeMathWalker the trickier issue you are going to have is when you train. If your network outputs indexes, those are discrete steps, so backprop won't work.

If you want to be able to backprop so your network actually learns how to crop, you need to make everything differentiable. That means the crops are real numbers and you interpolate to get fractional cropping.

Depending on what you're trying to do, you could rescale all of the crops to the same size, and then you could do multiple crops in a batch.