keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.98k stars 19.48k forks source link

Convolution with masked input? #411

Closed iskandr closed 7 years ago

iskandr commented 9 years ago

I want to perform convolution on a sequence input before feeding it to an RNN but it looks like Convolution1D ignores its input mask (and thus wouldn't work with sequences of differing length). Is there any way to turn a convolution into a masked layer?

fchollet commented 9 years ago

This will be resolved by the upcoming TimeDistributed layer; closing.

iskandr commented 9 years ago

@fchollet Maybe I'm misunderstanding TimeDistributed but I think that accomplishes something different: uniformly applying a convolution to series or images of the same size at each timestep. What I need is to perform a convolution across time, with sequences of varying lengths (thus the need for masking).

My data has a shape like (n_samples, max_timesteps, n_features). Each individual sample sequence may have fewer than max_timesteps (and I would want a "full" convolution to not generate outputs in masked out regions of the sequence).

jpuigcerver commented 8 years ago

Hi,

I have been working in implementing support for masked inputs for the Convolutional layers.

First, let me introduce you to my scenario: I am trying to apply convolutions on images of very different sizes. The output of the convolutional towers will be fed into RNN, so I need to keep track of the masks through the Convolutional and Pooling layers. In order to do so, the idea is to pack all images in a minibatch into a tensor as big as the largest image, pad with zeros the rest of the images and use masks to get the proper behavior on the RNN.

However, there are some decisions that have to be made regarding the definition of convolution/upsamplig/downsampling with masks.

Ideally, I would like that the output of each sample to be independent of the max size of the images in the mini-batch, so that the inference results are deterministic.

The upsampling operations are trivial, since one just have to upsample the masks as well.

For the convolution operations (both 1D and 2D), the border_mode = 'same' is trival, since the output mask is just the input mask. The 'valid' case presents some corner cases that one has to think about. Since in most of practical cases masks are binary, the proper behavior can be achieved with max-pool operations over the input mask (basically, an OR).

The tricky part comes with the downsampling operations, since masked elements should be left outsite of the max or average operations. In this case, I decided to apply the same pooling over the masks.

What do you think about these implementation decisions?

markostam commented 8 years ago

I believe I am seeing a similar error.

I am trying to use a modified version of the tutorial VGG-like convent to do binary classification of a set of images of various different size. Similar to the previous poster, my solution is to pad the images up to the size of the largest image and then use a masking layer to ignore the zero-padding I added. However, when I try to build the CNN I get the error

Exception: Layer convolution2d_1 does not support masking, but was passed an input_mask: Any{3}.0

Am I overlooking a better way to implement this idea in Keras? Maybe this just a feature that has not been implemented yet or does my implementation overlook something?

BTW my convnet implementation looks like this:

def buildCNN(depth,width,height,outputShape):
    CNN = Sequential()

    #masking layer to ignore padding
    CNN.add(Masking(mask_value=9999, input_shape=(depth,width,height)))

    # input: 232x300 images with 3 channels -> (3, 100, 100) tensors.
    # this applies 32 convolution filters of size 3x3 each.
    CNN.add(Convolution2D(32, 3, 3, border_mode='valid'))
    CNN.add(Activation('relu'))
    CNN.add(Convolution2D(32, 3, 3))
    CNN.add(Activation('relu'))
    CNN.add(MaxPooling2D(pool_size=(2, 2)))
    CNN.add(Dropout(0.25))
    #
    CNN.add(Convolution2D(64, 3, 3, border_mode='valid'))
    CNN.add(Activation('relu'))
    CNN.add(Convolution2D(64, 3, 3))
    CNN.add(Activation('relu'))
    CNN.add(MaxPooling2D(pool_size=(2, 2)))
    CNN.add(Dropout(0.25))
    #
    CNN.add(Flatten())
    # Note: Keras does automatic shape inference.
    CNN.add(Dense(256))
    CNN.add(Activation('relu'))
    CNN.add(Dropout(0.5))
    #
    CNN.add(Dense(outputShape))
    CNN.add(Activation('softmax'))
    #
    sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
    CNN.compile(loss='categorical_crossentropy', optimizer=sgd)
    #
    return CNN
RaffEdwardBAH commented 8 years ago

I have a similar need for this functionality.

iskandr commented 8 years ago

Hey @fchollet, do you think this is now possible?

9thDimension commented 8 years ago

I need this functionality badly.

shashankg7 commented 8 years ago

Hi @fchollet .. Does Convolution1D now supports variable length sequence?

yaumeg commented 7 years ago

Hi, Same question here. I posted some code on the Keras users group as a proof of concept :

https://groups.google.com/forum/#!topic/keras-users/KfoTsCHldM4

The idea is to consider masking only when adressing the final dense layer, with a numpy array defining a mask for each sample (each mask has to be calculated manually, taking into account successive conv & maxpool layers ; zeroes mean masking, ones mean no mask). An auxiliary input is used to provide this « masking » numpy array, which is merged with the output of the final convolution layer. At this point, the merge layer use a mutliplier as concat mode, acting as a mask. In the code sample I posted, if I put zeroes everywhere in my « masking » numpy array, I obtain systematically an exact 0.5 output, meaning that the sigmoid received only zeroes ; so, it seems to work. What do you think of this approach ?

monod91 commented 7 years ago

Hi @iskandr @9thDimension @shashankg7 may I ask you how you fixed this ad the end?

I have an architecture containing some Convolution1D and some Merge layers, and they're still not working with mask_zero=True

iskandr commented 7 years ago

@monod91 I ended up giving up on Keras's masking because it only works on very few layers. Instead I allowed the padding character in sequences (represented by index 0) to just have an explicit embedding and do global pooling after some number of conv/downsample layers. This seems to work surprisingly well.

iskandr commented 7 years ago

@fchollet @obilaniu Just to give a minimal example of what doesn't currently work (but would be awesome if it did):

i = Input(name="input", shape=(30,), dtype="int32")
e = Embedding(input_dim=21, output_dim=32, mask_zero=True)
c = Conv1D(padding="same", kernel_size=3, filters=32)
x = c(e(i))

The slightly more elaborate version of this that I would like to do is combining multiple kernel sizes at each layer, then doing maxpooling in a way that properly propagates the mask.

jinfengr commented 7 years ago

@yaumeg I tried your solution, but it doesn't work well in my problem. I found it easily to get some large intermediate outputs, but I have no idea why this happens (by principle or by practical).

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Engineero commented 6 years ago

I have not seen a resolution on this issue or the related issues referenced in the thread. Is this resolved? If so, what resolved it and how? If not, is there a plan to do so?

vonjackustc commented 5 years ago

How to solve this issue? I want this: Masking => TimeDistributed(Conv1D) => TimeDistributed(MaxPooling1D) => TImeDistributed(Flatten) => LSTM Can you help me? Thank you!

noob-procrastinator commented 5 years ago

I am also having a similar problem, I have images that range in size from 100x100 to 14000x5000 and would like to bring them to an aspect ratio of 2 and size 500x1000, reading I found out it wouldn't be a good idea to pad and resize but to use a mask however there is no solution I could find for a keras CNN

hajarsaihi commented 5 years ago

any updates re this?

liumilan commented 5 years ago

any updates re this?

yungmsh commented 5 years ago

@monod91 I ended up giving up on Keras's masking because it only works on very few layers. Instead I allowed the padding character in sequences (represented by index 0) to just have an explicit embedding and do global pooling after some number of conv/downsample layers. This seems to work surprisingly well.

Hi @monod91, would you be able to share some code examples of how you did global pooling after some conv layers? I'm trying to use this method as a workaround for the (nonexistent) Mask->Conv approach for some time series data and would love to see how you implemented your workaround solution.

pritamqu commented 5 years ago

@monod91 I ended up giving up on Keras's masking because it only works on very few layers. Instead I allowed the padding character in sequences (represented by index 0) to just have an explicit embedding and do global pooling after some number of conv/downsample layers. This seems to work surprisingly well.

Hi @monod91, would you be able to share some code examples of how you did global pooling after some conv layers? I'm trying to use this method as a workaround for the (nonexistent) Mask->Conv approach for some time series data and would love to see how you implemented your workaround solution.

Can you please share your code example, how you solved this?

E1k3 commented 4 years ago

Are there any plans to support masking in convolutional layers in the future?

Kosisochi commented 4 years ago

Any updates on this problem. @monod91 can you share your code sample to help us understand what you did? Thanks

shrimpceviche commented 4 years ago

@iskandr would you be able to share some code examples of how you did global pooling after some conv layers? much appreciated!

UdiBhaskar commented 4 years ago

any update on this?

olaiya commented 4 years ago

Yes any updates? It would be fantastic if this was supported!

a-r-j commented 4 years ago

Is this feature in the works? Happy to contribute.

chopwoodwater commented 4 years ago

Is this issue opened 5 years ago fixed now?

junyongyou commented 4 years ago

How to solve this issue? I want this: Masking => TimeDistributed(Conv1D) => TimeDistributed(MaxPooling1D) => TImeDistributed(Flatten) => LSTM Can you help me? Thank you!

Hi, did you solve the issue? I want to build a similar model as yours, but TimeDistributed(Conv1D) seems still to not support masking. Thank you.

ekurtgl commented 4 years ago

How to solve this issue? I want this: Masking => TimeDistributed(Conv1D) => TimeDistributed(MaxPooling1D) => TImeDistributed(Flatten) => LSTM Can you help me? Thank you!

Hi, did you solve the issue? I want to build a similar model as yours, but TimeDistributed(Conv1D) seems still to not support masking. Thank you.

Same here...

Kosisochi commented 4 years ago

Hello. I didnt solve it. I switched to Pytorch instead On Oct 11, 2020 6:05 PM, ekurtgl notifications@github.com wrote:

How to solve this issue? I want this: Masking => TimeDistributed(Conv1D) => TimeDistributed(MaxPooling1D) => TImeDistributed(Flatten) => LSTM Can you help me? Thank you!

Hi, did you solve the issue? I want to build a similar model as yours, but TimeDistributed(Conv1D) seems still to not support masking. Thank you.

Same here...

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

Kosisochi commented 4 years ago

Hello,I didn't solve it.  I switched to PytorchOn Oct 4, 2020 8:47 AM, Junyong You notifications@github.com wrote:

How to solve this issue? I want this: Masking => TimeDistributed(Conv1D) => TimeDistributed(MaxPooling1D) => TImeDistributed(Flatten) => LSTM Can you help me? Thank you!

Hi, did you solve the issue? I want to build a similar model as yours, but TimeDistributed(Conv1D) seems still to not support masking. Thank you.

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

ArunPrasath20 commented 3 years ago

Hi, Anyone knows whether this issue is fixed. I am also dealing with a similar scenario but when I add masking layer to 1D CNN, I am not getting any error.

Does keras been upgraded to support the masking layer for CNN?

frsnjung commented 3 years ago

@ArunPrasath20 tf.keras supports it. I don't know if the latest standalone version of keras has implemented this

olaiya commented 3 years ago

Which version of tensorflow supports it? I've tried v2.4.3 and 2.6.0 and masking with a 1d convolutional layer does not seem to work! Maybe I'm doing something wrong?

For example:

model = tf.keras.models.Sequential([ tf.keras.layers.Masking(mask_value=-99999.99, input_shape=[None, 1]), tf.keras.layers.Conv1D(filters=20, kernel_size=5, strides=1, padding='same'), tf.keras.layers.GRU(20, return_sequences=True), tf.keras.layers.GRU(20), tf.keras.layers.Dense(1) ])

Does not work! By that I mean the loss is static with respect to the number of epochs

However swapping tf.keras.layers.Masking for tf.keras.layers.InputLayer as follows:

model = tf.keras.models.Sequential([ tf.keras.layers.InputLayer(input_shape=[None, 1]), tf.keras.layers.Conv1D(filters=20, kernel_size=5, strides=1, padding='same'), tf.keras.layers.GRU(20, return_sequences=True), tf.keras.layers.GRU(20), tf.keras.layers.Dense(1) ])

does train on the data, training against values with -99999.99 rather than ignoring them. The loss does improve. So I don't think it is my architecture that is the problem. What I really need is to mask events. Again it would be fantastic if tf.keras.layers.Conv1D supported masking.

lurenyi233 commented 3 years ago

Hi, does anyone know of any other packages that support this feature?

profPlum commented 2 years ago

@ArunPrasath20 tf.keras supports it. I don't know if the latest standalone version of keras has implemented this

Is this confirmed?

profPlum commented 2 years ago

Which version of tensorflow supports it? I've tried v2.4.3 and 2.6.0 and masking with a 1d convolutional layer does not seem to work! Maybe I'm doing something wrong?

For example:

model = tf.keras.models.Sequential([ tf.keras.layers.Masking(mask_value=-99999.99, input_shape=[None, 1]), tf.keras.layers.Conv1D(filters=20, kernel_size=5, strides=1, padding='same'), tf.keras.layers.GRU(20, return_sequences=True), tf.keras.layers.GRU(20), tf.keras.layers.Dense(1) ])

Does not work! By that I mean the loss is static with respect to the number of epochs

However swapping tf.keras.layers.Masking for tf.keras.layers.InputLayer as follows:

model = tf.keras.models.Sequential([ tf.keras.layers.InputLayer(input_shape=[None, 1]), tf.keras.layers.Conv1D(filters=20, kernel_size=5, strides=1, padding='same'), tf.keras.layers.GRU(20, return_sequences=True), tf.keras.layers.GRU(20), tf.keras.layers.Dense(1) ])

does train on the data, training against values with -99999.99 rather than ignoring them. The loss does improve. So I don't think it is my architecture that is the problem. What I really need is to mask events. Again it would be fantastic if tf.keras.layers.Conv1D supported masking.

Well wouldn't static loss be expected unless you passed non-masked values? Did you in fact pass any non-masked values? If not I feel like you may have observed the masking functionality in your report so I'm confused...

frsnjung commented 2 years ago

@ArunPrasath20 tf.keras supports it. I don't know if the latest standalone version of keras has implemented this

Is this confirmed?

we used it successfully in one of our projects with tf.keras (tf version 2.4) as follows:

Assuming your dataframe is filled with the masking value -1 up to the timestep max_sequence_length:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Masking, Conv1D, GlobalMaxPooling1D
model = Sequential()
max_sequence_length = 188
model.add(Masking(mask_value=-1, input_shape=(max_sequence_length, self.cnn_config.input_dim)))
model.add(Conv1D(filters=self.cnn_config.filters, kernel_size=self.cnn_config.window_size, activation="relu"))
model.add(GlobalMaxPooling1D()) # or model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
profPlum commented 2 years ago

@frsnjung : Actually I think you're wrong I just got an error about Conv1d Masking not being supported in TF version: 2.7.0

Strangely it only happened when I used TemporallyDistributed layer with Conv1d layer inside but still the fact that it ever throws an error saying explicitly that it is not supported implies it is not at all & it is likely a bug that it does not throw an error the rest of the time.

profPlum commented 2 years ago

@fchollet Are there any plans to add this feature ever? I think it would be quite helpful. It seems not even very straightforward to force the embedding of 0 to be 0 (correct me if I'm mistaken) which might be a useful workaround.

Nafees-060 commented 2 years ago

In my case, it runs (means no error generated), but I'm not sure if it will work exactly how we expect. Because I have different variable lengths inputs in each batch. To equalize the short length inputs within the batch, I used 0-padding. However, I believe that when I apply the masking layer, it will ignore all of the 0 and perform convolution on the original data. But I feel that it is not working as I want. Because I use padding with stride=1 to reduce the output feature map's dimension, since some of my data's inputs are extremely small inside each batch, in this way there should be no remaining input feature for convolution feature map after 3 to 4 layers of CNN if masking works and 0's are ignored. So, in this scenario, an error should be generated, however CNN is correctly executed, which suggests that CNN does not ignore the 0. values. The code is as follows:

  input_layer = tf.keras.Input(shape=input_shape, name="time_series_activity")
    input_mask = tf.keras.layers.Masking(mask_value=0.)(input_layer)
    con_l1 = tf.keras.layers.Conv2D(64, (5, 1), activation="relu")(
        input_mask)

Any comment or advise, would be greatly appreciated.

nox4y commented 2 years ago

Are there any updates @fchollet?

yingjieyao commented 2 years ago

I have just tested the Conv1D with masking layer, and it worked. If the masking layer is removed, the training process will end with a huge loss.

a = Input(shape=(30, 10))                                                          
b = Masking(-999999999)(a)                                                         
b = Conv1D(3, 3, padding='same')(b)                                                
b = Flatten()(b)                                                                   
b = Dense(1)(b)                                                                    
model = Model(a, b)                                                                

x = np.random.random((100, 30, 10))                                                                                                                                                            
y = np.random.random((100, 1))                                                     
x[30, 15, :] = -999999999                                                          

model.compile('adam', 'mse')                                                       
model.fit(x, y)
stevetracvc commented 2 years ago

I can confirm what @yingjieyao said, using TF v2.8.1

If I run that code as is, I got a loss less than 0.5. If I change the masking value, and DO NOT change the values in x, then the loss is several billion (using default number of epochs). I would presume that this does, indeed, mean that masking works with Conv1D layers (I also tested Conv2D).

However, there's nothing in the code for the Conv layer to support masking. After masking, the masked values are ~NaN~. (edit, I got that earlier, but now I'm not...masked values become zeros) I'm not sure how the convolution layer handles these values