Error in results of Next-Frame prediction with LSTM

vkalekis commented 2 years ago

Hello.

The output of predicting the next frames in Moving MNIST is different from what is shown in the keras examples page. In fact, the network doesn't seem to be able to give any good estimate of the next frame.

Ran on a RTX 3080, cuDNN version 8204, tensorflow version 2.7.0

My output on a random sequence after training for 20 epochs: test

The output given in the notebook from keras: test1

mulvey-g commented 2 years ago

I am having a similar problem with the convLSTM. Code is on Keras website (https://keras.io/examples/vision/conv_lstm/) or on GitHub (https://github.com/keras-team/keras-io/blob/master/examples/vision/conv_lstm.py)

I am using NVIDIA GeForce RTX 2080 with Max-Q design, TensorFlow 2.5.0 and cudNN 8.2.1. After 20 epochs, these are some of the results I got.

2CBDA382-5D6B-4E58-9630-6D88BA501B4D DEC4C8AE-4626-4706-9031-D1AB63E0132B

mulvey-g commented 2 years ago

Hello,

I was just wondering if you got any further with this? Did you manage to sort any improvement with code, results etc.?

vkalekis commented 2 years ago

Hello.

Yes, I have found the issue on the example code. The main problem is in the way the network makes a prediction. The network takes as input a sequence of n images (shape (None, n, 64, 64, 1)) and it gives as output a vector of the same shape (None, n, 64, 64, 1). This is done with the first 10 frames of a given gif and it returns a vector of 10 next-frame predictions of each frame, extracting the final one (the nextframe prediction of the 10th frame -> 11th frame). Afterwards, it concatenates the prediction of the 11th frame to the input frames and repeats the same procedure getting a prediction for the 12th frame.

The issue is that the first prediction is fuzzy around the edges and when the network takes it as an input, it "learns" that the prediction is fuzzy. The consequence is that the prediction for the 12th frame is more fuzzy etc with the noise getting larger at each frame.

A solution I found is to concatenate only the next GT frame as the input for the next iteration such that:

1st iter: input GT1-GT10 -> Pred11
2nd iter: input GT2-GT11 -> Pred12

The network doesn't take its predictions as input which leads to better results. I haven't figured out a way in order to feed the predictions as input and not get noise.

Στις Παρ 14 Ιαν 2022 στις 1:42 μ.μ., ο/η mulvey-g @.***> έγραψε:

Hello,

I was just wondering if you got any further with this? Did you manage to sort any improvement with code, results etc.?

— Reply to this email directly, view it on GitHub https://github.com/keras-team/keras-io/issues/732#issuecomment-1013048104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI22LDY3HRWHRRZPDFU4E53UWAD3PANCNFSM5JHHNBNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

keras-team / keras-io

Error in results of Next-Frame prediction with LSTM #732