jamesmf / mnistCRNN

Simple TimeDistributed() wrapper Demo in Keras; sums images of MNIST digits
61 stars 24 forks source link

How I can modify your code to support Masking? #5

Open o0windseed0o opened 8 years ago

o0windseed0o commented 8 years ago

Hi jame, I found your code https://github.com/jamesmf/mnistCRNN/blob/master/scripts/addMNISTrnn.py helpful on sequence tagging, in which we can add more complicated layers on each timestep. Currently, I am wondering how I can add a Masking layer if I have batches of variable length, say, each batch has no more than maxToAdd pics. A direct way is to pad the shorter batches with zero matrix so that the input shape to CNN can be fixed. However, I find that Masking can make sense only before the RNN layer but not the CNN layer.

Do you have any ideas how to masking the input layer, since without masking there would be a lot of computational cost and also might be side effect on the optimization, right?

jamesmf commented 8 years ago

I haven't tried to add masking yet, but there have been a number of questions asked about it in the keras issues section.

As this issue remains open, I'm not sure if it's supported yet. I have only ever used the zero-padding technique.

o0windseed0o commented 8 years ago

Thanks for your quick reply. I added the masking layer before the rnn layer and it compiles, and it seems that masking is well suited for the rnn layers but not the convolutional layers, since convolutional layers rely on the input image of a fixed shape. I will keep following through the issue.

oakkas commented 8 years ago

hi @o0laika0o, Would you mind describing and guiding me that how did you achieve masking. I would like to use timedisributed layers with variable length sequence of frame/video and want to figure out how to accomplish masking/padding?

Thanks

Mark0908 commented 6 years ago

Hi jame: What does the parameter "maxToAdd" mean?And how can I decide the size about this parameter?

Thanks

jamesmf commented 6 years ago

maxToAdd is the number of MNIST digits to add together in this example.

If you're repurposing this code, it would represent the time or sequence dimension. So if you're processing the frames of a video, it would be the frame count.