Failure to maintain recurrent activation in LSTM layer

I use neural networks for online sequence prediction. The performance of the LSTM in Keras (using Tensorflow) in this case, however, is not nearly as good it should. Maybe someone can help me understand where the problem lies.

The peculiarity of online learning is that the underlying sample distribution is allowed to be non-stationary. To be able to rapidly adapt to changes in this distribution, the model is updated each time step with the last observation as the input and the current observation as the target.

An example application is an agent that moves randomly in a grid world like the one below where the agent is currently at position (6, 2).

  0 1 2 3 4 5 6 7
0 x x x x x x x x
1 x . . . . x . x
2 x . x x . x a x
3 x . x . . . . x
4 x x x . x . . x
5 x . . . x x x x
6 x . x . . . . x
7 x x x x x x x x

The agent perceives the four cells that surround it as well as its own movement to the north, east, south, or west. An example input to the model, therefore, consists of the agent's observation of the surrounding walls (i.e., [0, 1, 0, 1] for walls to the left and the right) as well as the action performed in this situation, let's say a movement to the north (i.e., [1, 0, 0, 0]). (This is a partially observable Markov decision process.)

The resulting input vector to the model is the concatenation of the sensor and the motor encoding (i.e., [0, 1, 0, 1, 1, 0, 0, 0]). If the agent performs a movement to the north at position (6, 2), the next observation is walls everywhere, except to the south (i.e., [1, 1, 0, 1]). One training example from the grid world above, therefore, is (input: [0, 1, 0, 1, 1, 0, 0], target: [1, 1, 0, 1]).

In each time step t:
  receive new observation o_t
  train the model with input o_{t-1} + a_{t-1} and target o_t
  randomly select new action a_t
  use o_t + a_t as input to predict the next observation o_{t+1}

The ambiguity of moving north from position (6, 2), on the one hand, and moving north from a position that appears identical to the agent (e.g. (3, 4)), on the other, should be partly resolvable by networks with a recurrent layer that maintains information from prior inputs.

At least, the prediction performance of a recurrent neural network should be better than simple feedforward network (with no regard for the prior sequence of observation-action-pairs).

In fact, however, a simple feedforward network with 20 neurons in one hidden layer performs better than a LSTM with the same structure.

The brown plot is the feedforward network, the black is the LSTM, and the green is a naive frequentist Markov predictor. The values are averaged over ten runs, each point shows the number of successful predictions over 1000 steps, and a total of 50000 steps have been performed.

Can someone assist me in understanding why this is the case?

PS: I took care to maintain the activation of the recurrent neurons throughout the process. In case, anyone's interested, here is my code for the LSTM in Keras.

import numpy
from keras.layers import Dense, LSTM
from keras.models import Sequential
from keras.optimizers import Adam

class LSTMPredictor:
  def __init__(self, observation_pattern_size: int, action_pattern_size: int, alpha: float = .01):
    self.input_size = observation_pattern_size + action_pattern_size
    self.output_size = observation_pattern_size
    self.alpha = alpha
    self.network = Sequential()
    self.network.add(LSTM(20, batch_input_shape=(1, 1, self.input_size, activation="sigmoid"), stateful=True, return_sequences=True))
    self.network.add(Dense(self.output_size, activation="sigmoid"))
    self.network.compile(loss='mse', optimizer=Adam(lr=self.alpha))

  def perceive(self, sensorimotor_pattern: Sequence[float], sensor_pattern: Sequence[float]):
    _input = numpy.reshape(sensorimotor_pattern, (1, 1, self.input_size))
    _target = numpy.reshape(sensor_pattern, (1, 1, self.output_size))
    self.network.fit(_input, _target, batch_size=1, epochs=1, verbose=0)

  def predict(self, sensorimotor_pattern: Sequence[float]) -> Sequence[float]:
    _input = numpy.reshape(sensorimotor_pattern, (1, 1, self.input_size))
    _output = self.network.predict(_input)
    return list(numpy.reshape(_output, self.output_size))

keras-team / keras

Failure to maintain recurrent activation in LSTM layer #7358