keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.98k stars 19.47k forks source link

Variable-size image to convolutional layer #1920

Closed shodutta92 closed 8 years ago

shodutta92 commented 8 years ago

Hi, I'm trying to run variable-sized images through convolutional layers but I'm getting an error about the row sizes not matching. From what i understand in this post, I should be able to use a snippet like

from keras.models import Sequential
from keras.layers import Convolution2D
import numpy as np

m = Sequential()
m.add(Convolution2D(8, 3, 3, input_shape=(1, 10, 10)))
m.compile(loss="mae", optimizer="sgd")
c = m.predict(np.random.rand(1, 1, 10, 10))
c = m.predict(np.random.rand(1, 1, 20, 20))

However, the second predict statement throws an error:

ValueError: The hardcoded shape for the number of rows in the image (10) isn't the run time shape (20).

Is there some way to use this in an input-shape independent fashion?

fchollet commented 8 years ago

You should use:

input_shape=(1, None, None)

None in a shape denotes a variable dimension. Note that not all layers will work with such variable dimensions, since some layers require shape information (such as Flatten).

slegall56 commented 7 years ago

Ok but if I have a flatten layer in my network such as :

input = Input(shape=(None, None,3))
x = Convolution2D(10, 3, 3, border_mode='valid', init = 'he_normal')(input)
x = PReLU()(x)
x = MaxPooling2D(pool_size=(2,2))(x)
x = Convolution2D(16, 3, 3, border_mode='valid', init = 'he_normal')(x)
x = PReLU()(x)
x = Convolution2D(32, 3, 3, border_mode='valid', init = 'he_normal')(x)
x = PReLU()(x)
x = Flatten()(x)
x = Dense(2)(x)
x = Activation('softmax', name='activatioon)(x)

model = Model(input = input, output= x)

How can I deal with it ?

ghost commented 7 years ago

@slegall56 You should probably use GlobalAveragePooling. It is essentially an average pooling with adaptive size, and it converts different-sized convolutional feature maps into a constant size 1D vector.

ybdesire commented 7 years ago

How Keras internel actually handle the variable-size input, just add zero padding ?

KeithWM commented 7 years ago

@ybdesire The size of the weights in a convolutional layer does not depend on the size of the input image, only on the number of channels in, channels out and the size of the kernel. So there's no need to use zero padding for the weights.

For the memory used to store a batch of input images I would presume no padding is used, but images are packed in memory. @moi90 is right: all images in one batch need to be of the same size.

moi90 commented 7 years ago

Model.train_on_batch receives a Numpy array as input, so at least in every batch the images have to have the same size. Its up to you, how you achive this. ImageDataGenerator.flow_from_directory has a parameter called target_size to resize all images of the whole dataset to a common size. This contradicts what @KeithWM said.

aidamian commented 7 years ago

I have a similar problem with Keras (tf backend) and I can't seem to find a solution. Wonder if I am missing something. Basically my network architecture is defined by:

  in_layer = Input(shape=(None, None,nr_channels))
  x = Conv2D(16,(4,4), activation = 'elu')(in_layer)  # single stride 4x4 filter for 16 maps
  x = Conv2D(32,(4,4), activation = 'elu')(x)         # single stride 4x4 filter for 32 maps
  x = Dropout(0.5)(x)
  x = Conv2D(64,(4,4), activation = 'elu')(x)         # single stride 4x4 filter for 64 maps
  x = Dropout(0.5)(x)
  x = Conv2D(128, (1,1))(x)                           # finally 128 maps for global average-pool
  x = GlobalAveragePooling2D()(x)                     # pseudo-dense 128 layer
  output_layer = Dense(10, activation = "softmax")(x) # softmax output
  model = Model(inputs = input_layer, outputs=output_layer)
  model.compile(optimizer = "adam", loss = "categorical_crossentropy",
                metrics=["accuracy"])

Now I train it with 60k training mnist without any kind of modifications. Then I evaluate it on 10k test mnists (also 20x28) and I receive a 93% after couple of epochs. However, when I pass a larger image: say 100x75 where copy in a random place 1 mnist test image and I do predict(test_image_100x75) I receive something very strange:

  Label: 3
  Prediction: [[ 0.  0.998  0.  0.  0.  0.  0.  0.  0.  0.]]
  y_test:     [ 0.  0.  0.  1.  0.  0.  0.  0.  0.  0.]

  Label: 7
  Prediction: [[ 0.  0.995  0.  0.  0.  0.  0.  0.  0.  0.]]
  y_test:     [ 0.  0.  0.  0.  0.  0.  0.  1.  0.  0.]

So, basically my model predicts label "1" for any variable size image I give. There are no other operations on the variable size test-set other than 1 random-position copy (all var sized images are checked with matshow as in this link example of var sized test image

aidamian commented 7 years ago

Well, I am ashamed to admit, but in my previous post I used global average pooling that basically destroyed the small-patch image information (small patch on image info in a big zero-ed image..). Obvious enough I switched to GlobalMaxPooling2D and everything worked perfectly. So above code becomes:

  in_layer = Input(shape=(None, None,nr_channels))
  x = Conv2D(16,(4,4), activation = 'elu')(in_layer)  # single stride 4x4 filter for 16 maps
  x = Conv2D(32,(4,4), activation = 'elu')(x)         # single stride 4x4 filter for 32 maps
  x = Dropout(0.5)(x)
  x = Conv2D(64,(4,4), activation = 'elu')(x)         # single stride 4x4 filter for 64 maps
  x = Dropout(0.5)(x)
  x = Conv2D(128, (1,1))(x)                           # finally 128 maps for global average-pool
  x = GlobalMaxPooling2D()(x)                     # pseudo-dense 128 layer
  output_layer = Dense(10, activation = "softmax")(x) # softmax output
  model = Model(inputs = input_layer, outputs=output_layer)
  model.compile(optimizer = "adam", loss = "categorical_crossentropy",
                metrics=["accuracy"])

Above code works perfectly, the issue should be closed.

shay86 commented 7 years ago

Hi, I read and tried to implement your model on my data. Doing as you say customize my data to have same size in each (patch) inputed to my model. However, I discover that your data Lable has fixed size (10) as you Dense layer shows, but for me my input size are same as output size both are equal but varied from batch to batch. I tried to omit the Dense layer but even though the GlobalMaxPooling layer give me same error {ValueError: Error when checking target: expected dense_1 to have shape (1, 20) but got array with shape (1, 21)}
when it starts the second batch because of the different output size. Any suggestion for this problem???? thanks

aidamian commented 7 years ago

So, the whole post is about using variable size images as input for a conv net by using GlobalMaxPooling just before your dense block or just final softmax classifier and "matching" the filters in last conv layer with number of dense units in first dense layer (matching does not mean equal number, it can be any number of filters and certainly the bigger the better - so the global pooling will transform each filter in a actual unit input for first dense layer in classification block). This is not about the variable size labels - labels are ... labels, and should you use softmax then labels must be one hot encoded by number of classes. Maybe I understood your question incorrectly and/or maybe you should post your code and I will gladly help.

shay86 commented 7 years ago

Hi, thanks for your respond. so this is my code:

`

my generator function

def generate(X_path, y_path,batch): while 1: with open(X_path, "rb") as csv1, open(y_path, "rb") as csv2: X = [map(int, x.split()) for x in csv1 if x.strip()] Y = [map(int, x.split()) for x in csv2 if x.strip()] for i in range(0, len(X), batch): array1 = np.array(X[i:i + batch], dtype=int) Xshape = np.reshape(array1, (-1, array1.shape[1], 1)) array2 = np.array(Y[i:i + batch], dtype=int) Yshape = np.reshape(array2, (-1, array2.shape[1])) yield(Xshape,Yshape) if name == "main": batch=100 epoch=50 np.random.seed(700) model = Sequential() model.add(Convolution1D(128, 6, padding='valid', batch_size=batch, input_shape=(None, 1), activation='relu')) model.add(MaxPooling1D(pool_size=2, strides=2, padding='valid')) model.add(Convolution1D(256, 6, padding='valid', activation='relu')) model.add(MaxPooling1D(pool_size=2, strides=2, padding='valid')) model.add(GlobalMaxPooling1D()) model.add(Dropout(0.5))
model.add(Dense(20, activation='sigmoid')) # model.add(Dense(449, activation='relu')) print(model.summary()) sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True) mylossfunc = 'binary_crossentropy' model.compile(loss=mylossfunc, optimizer=sgd, metrics=['accuracy']) model.fit_generator(generate('trainData.csv','trainLable.csv',batch), steps_per_epoch=batch, nb_epoch=epoch, verbose=2, callbacks=[], validation_data=validation_generate('testData.csv','testLable.csv',batch) , class_weight=None, nb_worker=1,validation_steps=5)

`

this is my simple model for now. My train and test data and Lable are equal - i mean my train data== my train lable in length every batch- for example the sequence length in first batch is 20 and in next batch is 21 and so on. Using your method for the input shape is perfect, and using GlobalMaxPooling is great solution. However when i finish training my first batch this error occur:
ValueError: Error when checking target: expected dense_1 to have shape (1, 20) but got array with shape (1, 21)} So if the unites in Dense Layer are not for output shape it should not give me this error. I even build Lambda layer trying to fix the output shape before it been passed to Dense layer but no luck because the output layer changes in every batch.. I know using LTMS of recurrent NN maybe beater for this station, but i prefer using Conv layers if that is possible.

aidamian commented 7 years ago

Hi again shay86, Couple of things I dont understand in your code:

  1. Why do you have a 2x2 max-pool before global max - it is totally redundant. Basically the global max pool will "see" the same maxes as the max-pool and will take the max-of-the-maxes anyway...
  2. You use a binary cross-entropy meaning your net expects the labels to be 0/1 (or -1/1, etc) and readout layer to be with only 1 unit (sigmoid, tanh, etc). Nevertheless you have 20 units in readout layer ... Maybe you want a FC layer before the actual 1-unit readout with 20 units !?

Now If I am to mind-compile your code I can easily see that you pass a 21 value vector to the 20 output layer... More than certainly (as you describe as well), you need/want to pass a 21 batch (ore more) observations to your model and each of the 21 observations contains 1 label that will be fed to the readout layer (and error will be computed based on J=-(labellog(prediction)+(1-label)log(1-prediction))... Please review your network architecture and consider adding a true readout layer (such as Dense(1, activation='sigmoid'), delete the redundant MaxPool before the GlobalMaxPool and re-run your experiment ensuring that you feed the same number of Y labels as the number of X sequences (each sequence for your 1D convolutional network having a label for the binary cross-entropy loss calculation)

shay86 commented 7 years ago

Hi andreidi, sorry to bother you with my stupid questions, but I am still learning. About the model I just thought that GlobalMaxPool is just instead Flatten layer for that I did not omit the MaxPool. However, I did not get your words here:

Now If I am to mind-compile your code I can easily see that you pass a 21 value vector to the 20 output layer...More than certainly (as you describe as well), you need/want to pass a 21 batch (ore more) observations to your model and each of the 21 observations contains 1 label that will be fed to the readout layer (and error will be computed based on J=-(labellog(prediction)+(1-label)log(1-prediction))

lets say: input shape= (batch,sequence,channel) Batch=100 sequence= None ( Because it is has different lengths in first batch is 20. second 21. third 122 ... and so on) channel =1 so input shape would be(100,None,1) Here my confuse begin what should the output ,target or Lable shape be (batch,sequence) or (sequence,channel) because it should be 2D
The sequence in Input and Target length are equal in each batch this my new model is:

Batch =100 epotch=5 model=Sequential() model.add(Convolution1D(64, 6 ,padding='valid',batch_size=batch,input_shape(None,1),activation='relu')) model.add(Convolution1D(128, 6 ,padding='valid')) model.add(Dropout(0.5)) model.add(GlobalMaxPooling1D()) model.add(Dense(1, activation="sidmoid")) print(model.summary()) model.compile(optimizer="adam", loss="sparse_categorical_crossentropy",metrics=["accuracy"]) model.fit_generator(generate('trainData.csv','trainLable.csv',batch), steps_per_epoch=batch, nb_epoch=epoch, verbose=2, callbacks=[],validation_data, class_weight=None, nb_worker=1)

and this my output screen shout

screenshot from 2017-08-26 14-38-53

thanks for your time trying to make me understand because I am really lost .

aidamian commented 7 years ago

Hi again shay86,

Keras i a great framework but it's kind of "complicated" to jump right into "action" without understanding the basics and without going a least a little bit deep both in math models and pipeline logic. So, first of all GlobalMaxPool is not a flattening operation is a max-by-full-filter operation meaning if you have 128 filters (say each with a certain dimension) going into GMP then the GMP will take the max out of each filter and basically "recreate" the 128 filters with only 1 value. So it is similar to flattening but it allows you to use variable sequence/dimensions of your input (nr of filters are fixed from CNN architecture and the your net basically ignores input size when transferring from conv to FC layers as the FC will "see" the fixed 1-unit-filters generated by the GMP) - hope this help ...

Now looking at your your error I guess you tried to pass a 100x20 label matrix to the y_train instead of a 100x1 vector. Please do not forget that your have a Dense(1) sigmoid readout layer meaning that even if you have a 100 observations batch with say 20 items/sequence of 1 depth you still have ONLY 1 label per batch observation. Second batch of 21 items sequence per batch observation will still have only 1 label for that 21 items sequence... You can change however you like/need your sequence size but keep in mind:

  1. All batch observations must have same sequence size /depth
  2. Using a Dense(1) logistic readout your training labels must be ... 1 label per observation so batch_size labels per batch...

Finally the code you pasted is different from the network layout presented by model.summary() in your screenshot. That is your code is Input->C1D(64filters)->C1D(128filters)->Drop->GMP->Readout(1) and your summary is Input->C1D(128)->C1D(64)->Drop->GMP->Readout(1) Also I dont understand why do you use a categorical loss with a Dense(1) sigmoid readout - maybe switch to binary_crossentropy ?

shay86 commented 7 years ago

It worked... Hi andreidi, thanks for your advice, it make my model work. After a lot of reading I found i don't want GlobalMaxPooling just MaxPooling is fine so i keep the 2D shape of input till the Dense layer and reshape my output to be(sequence length,class or channel) my model now is look like this

model = Sequential()
model.add(Convolution1D(16, 4, padding='same', batch_size=batch, input_shape=(None,1), activation='relu'))
model.add(Convolution1D(32, 4, padding='same', activation='relu'))
model.add(Dropout(0.5)) 
model.add(Convolution1D(64, 4, padding='same', activation='relu'))
model.add(Dropout(0.5))  
model.add(Convolution1D(128, 4, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=10, strides=None, padding='same'))
model.add(Dense(1, activation='sigmoid'))  

But I have a question: Is CNN or Keras has any limitation of the sequence length size variation ? let me explain: If i pass dummy data to the model like this: input=([1,2,3],[1,2,3,4,5],[1],[1,2,3,4,5,6],[1,2,3,4,5,6,7,8]) target=([1,2,0],[1,2,0,1,0],[1],[0,1,1,0,2,0],[1,1,2,1,0,1,0,1]) it working fine till last sequence with 8 element length it give me this kind of error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [1,14,1] vs. [1,2,1] [[Node: metrics/acc/Equal = Equal[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_dense_1_target_0_2, metrics/acc/Round)]]

I goggled this error it say change the batch size but for me batch size=1. I tried to change the model variables, but no luck... So is this error come from my model then what is your advice to fix it .

aidamian commented 7 years ago

Hi Shay86, It still puzzles me your architecture. So basically you input a variable size sequence and use 16 filters of 4 bytes to generate 16 maps from 1 variable size sequence then you use the 16 maps to generate another 32 maps with a 4 bytes convolution, add some dropout, then another 64 filters, convolution and finally 128 convolution filter that generate the input of your maxpool. Here is the tricky part: as long as you have sequences for 10 bytes or less your last convolutional layer will generate 128 maps of maximum 10 bytes that will be transformed to 128x1 by your MaxPooling1D - basically you have a speudo GlobalMaxPooling (although you dropped it) - for any input with less than 10 steps the MaxPooling will generate 1 downsampled step... Then your Dense will receive the 128x1 vector and will compute the final 1 byte sigmoid activation. HOWEVER, if, let's say, you input a 15 sequence then after multiple conv propagation without any downsampling the MaxPooling will convert it to a 2 byte sequence (128x2) and will feed it to final Dense output thus generating a error - expecting a different input... This is quite strange... I dont even understand how you model compiles: meaning how it DOES determine the required weights between the MaxPooling and the Dense layer ... Strangely and finally your target data is also multiple bytes per observation although your net expects single binary target .... I hope I was clear enough so you understand the intuition/analysis I made on your model and maybe it will help you further. Nevertheless I dont know what you are trying to accomplish here and in my opinion (looking at your data) you should:

  1. zero pad your sequences to make them fixed
  2. either use a embedding layer that will transform your sequences into efficient vector representations ( from a 1 byte sequence to - lets say - 32 floats embedding) or one-hot encode each sequence. Certainly in both cases your input depth will not be 1 anymore (either embedding_size or onehot_size)...
  3. Feed the transformed sequences to your conv net.
shay86 commented 7 years ago

Hi again, I study a lot of CNN projects and papers, and I managed to run my project with different lengths depending on your advice so thank you so much. However I have some questions I will be grateful if you direct me to the answer. First : In my data I have 23 classes as input data that i want to classify it to 4 and 8 classes label for example: input:{1,1,2,2,22,5,8,9,20,3,7,8,11,15,18,18,18,20,20,5} first label:{1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,1,1,0,0,0,}// my first train is on this kind of labels of 4 class second label:{1,1,2,2,3,3,8,8,5,5,1,1,1,1,4,4,4,4,4,4}// my second train is on this kind of labels of 8 class my model : `def generate(X_path, y_path,batch): while 1: with open(X_path, "rb") as csv1, open(y_path, "rb") as csv2: X = [map(int, x.split()) for x in csv1 if x.strip()] Y = [map(int, x.split()) for x in csv2 if x.strip()]

        for i in range(0, len(X), batch):
            array1 = np.array(X[i:i + batch], dtype=int)
            encoded1 = to_categorical(array1, num_classes=24)
            Xshape = np.reshape(encoded1, (-1, array1.shape[1], 24))
            array2 = np.array(Y[i:i + batch], dtype=int)
            encoded = to_categorical(array2,num_classes=8)#or 4 in the other kind label
            Yshape = np.reshape(encoded, (-1, array2.shape[1],8))
            #Ylable.append(Yshape)
            yield (Xshape,Yshape)      

model = Sequential() model.add(Convolution1D(128, 6, padding='same', batch_size=batch, input_shape=(None, 24), activation='relu')) model.add(Convolution1D(64, 6,padding='same',activation='relu'))

model.add(MaxPooling1D(pool_size=6, strides=None, padding='valid'))

model.add(Dropout(0.5)) 
model.add(Dense(8, activation='sigmoid')) 
print(model.summary())`

my first question is why this model work fine without Max pooling layer and when I add Max pooling layer it give me this error tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [10,20,9] vs. [10,10,9] while I am sure the input data and labels has same same size?????????

Second question is: If I have more classes in CNN I get better results??? It should be the opposite, less class Higher accuracy ???? thanks for your patience.

yanxiangyi commented 6 years ago

Hi! I'm also training with different sizes and I get the model using GlobalMaxPooling which is the same as the following. My trainning images are in a numpy array called x_train. The problem is when I fit data into the data generater, the error comes as ValueError: setting an array element with a sequence.

My code is

train_datagen = ImageDataGenerator() train_datagen.fit(x_train)

history = model.fit_generator( train_datagen.flow(x_train, y_train, batch_size=batch_size), steps_per_epoch=x_train.shape[0] // batch_size, epochs=epochs, validation_data=(x_val, y_val), callbacks=[ModelCheckpoint('ResNet50-transferlearning.model', monitor='val_acc', save_best_only=True)] )

amarion35 commented 6 years ago

Hi, How do you provide a numpy array with a dimension of size undefined?

Let say you want to make 1D CNN with the inputs [[1,2],[3,4,5]]

>>> inputs = np.array([[1,2],[3,4,5]])
>>> inputs.shape
(2,)
>>> inputs.ndim
1
>>> model.fit(inputs, labels)
ValueError: Error when checking input: expected conv1d_1_input to have 3 dimensions, but got array with shape (2,)
prakhar19 commented 6 years ago

Hello,

I am having the same problem as @amarion35 .

Would be very grateful, if someone could help. This has been bothering me since 3 days.

Thank You.

amarion35 commented 6 years ago

The only solution is to use a fit_generator and set the dimension as None in input_shape like in the following example. Some layers (like Flatten in this example replaced with GlobalAveragePooling) may not works with this method because they can't compute the shape of the output.

model = Sequential()
model.add(Conv1D(2, kernel_size=(5,), input_shape=(None,1)))
model.add(GlobalAveragePooling1D())
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')

inputs = np.array([np.random.randint(0,10,random.randint(10,15)) for i in range(100)])
labels = np.array([[np.mean(i)] for i in inputs])

train_inputs = inputs[:int(0.8*len(inputs))]
train_labels = labels[:int(0.8*len(inputs))]
valid_inputs = inputs[int(0.8*len(inputs)):]
valid_labels = labels[int(0.8*len(inputs)):]

def generator(inputs, labels):
    i = 0
    while True:
        inputs_batch = np.expand_dims([inputs[i%len(inputs)]], axis=2)
        labels_batch = np.array([labels[i%len(inputs)]])
        yield inputs_batch, labels_batch
        i+=1

model.fit_generator(
    generator(train_inputs, train_labels),
    validation_data=generator(valid_inputs, valid_labels),
    steps_per_epoch=len(train_inputs),
    validation_steps=len(valid_inputs),
    epochs=10
)
prakhar19 commented 6 years ago

@amarion35 Thanks!

chenguiyuan commented 5 years ago

I used the above method, but I encountered the following problems, 'TypeError: 'NoneType' object is not callable'. How did you solve it?

prakhar19 commented 5 years ago

@chenguiyuan You would need to show some of your code. Also, mention the line where the error occurs.

Jorisvanlienen commented 5 years ago

If you know already the different input sizes, then it's also possible to define them before training separately with an and-operator. Like for example this: IMG_HEIGHT = 128 and 150 IMG_WIDTH = 128 and 150 IMG_CHANNELS = 3 and 1

RajezMariner commented 4 years ago

I am really trying to understand how the convolutional layer with different image size is actually working fine in the neural network? Any reference links please. I am not able to correlate the how the data size (height and width) actually looks like when it is fed into fully connected layers after spatial or global average pooling.