Stateful RNNs - inconsistent documentation

faltinl commented 4 years ago

Documentation of package keras, v2.1.6, concerning all RNN sections (LSTM, GRU, RNN), contains a paragraph headed 'Statefulness in RNNs'. To enable statefulness it advises essentially:

...

Specify stateful=TRUE in the layer constructor.
Specify a fixed batch size for your model. For sequential models, pass batch_input_shape = c(...) to the first layer in your model. For functional models with 1 or more Input layers, pass batch_shape = c(...) to all the first layers in your model. This is the expected shape of your inputs including the batch size. It should be a vector of integers, e.g. c(32, 10, 100).
Specify shuffle = FALSE when calling fit().

... While the first as well as the last items do not present any difficulties, the central part leads into problems: If, for example, an LSTM is used to process signals coming from a layer_conv..., the dimensions after convolution must and will change somehow due to application of the convolution kernel as well as in proportion to the number of filter units chosen for the layer_conv.... Moreover, pooling, flattening or reshape layers could be necessary in between, too. In any case, the tensor fed to the LSTM will have completely different dimensions than the input tensor fed to the input layer(s) as expected above.

The arising question now is: How do I specifiy all this and in which layer constructor?

Just as a matter of example, in my case (a functional model) I have input tensors shaped (Bs,6,7,6) which, through convolution by k=(3,3) in a layer_separable_conv2d of 20 filters, will be transformed into shape (Bs,4,5,20). Though batch size Bs will remain the same throughout, if I specify batch_shape = c(Bs,6,7,6) in my layer_input, the LSTM will never see a tensor of this shape. If, on the other hand, I specify batch_shape = c(Bs,4,5,20) there appears an error message already after the input_layer, because it doesn't know what to do with this shape.

Does that mean that statefulness is not possible in these cases? I cannot believe that.

Any help in that context would be warmly welcome.

Note: I have submitted a help request #13262, Stateful LSTMs - inconsistent documentation / error message w.r.t. batch_input_shape, on August 29, 2019, concerning an early encounter with this problem. This request didn't find much response. Now, at least, I understand where the problem actually comes from...

dfalbel commented 4 years ago

@faltinl Not sure if I understood completely, however you can partially specify input_shapes in keras. For example, if you only need to specify the batch_size shape you could use:

library(keras)
layer_input(batch_shape = shape(32, NULL, NULL, NULL))
#> Tensor("input_1:0", shape=(32, None, None, None), dtype=float32)

So, In your case if dimensions 2 and 3 can vary you could write, for example:

library(keras)
layer_input(batch_shape = shape(32, NULL, NULL, 20))
#> Tensor("input_1:0", shape=(32, None, None, 20), dtype=float32)

faltinl commented 4 years ago

Great! Thank you for this hint - that's a simple solution to this problem. And obviously it is accepted by Keras.

May I suggest to append that option to the current documentation?

dfalbel commented 4 years ago

Of course, do you want to create a Pull Request?

faltinl commented 4 years ago

Hm - I am not averse, but I read the Guide and it all looks very complicated...

brianrice2 commented 4 years ago

I know this is pretty old - so if I understand correctly, just want to add a note to the docs (roxlate-recurrent-layer.R) that for dimensions which can vary in size, use NULL for that piece? If we are on the same page, I can submit a PR.

faltinl commented 4 years ago

Yes, that would certainly clarify the options to specify the parameter "stateful" and put people on the right trace!

To confess it frankly - I didn't follow up the problem any further, since I still don't understand the peculiarities of the attribute "stateful" and "statefulness". For a certain time I tried to find enlightening literature on that subject, but didn't succeed. There are a few blog posts on actually using stateful recurrent layers, LSTM in particular, but I found these examples too tightly suited to these particular cases and so I have just given up experimenting with stateful LSTMs in my applications.

If you or somebody else knew a suitably general explanatory reference regarding this topic somewhere in the literature, I would suggest to place a hint to that reference in the documentation, too.

rstudio / keras3

Stateful RNNs - inconsistent documentation #902