keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.88k stars 19.45k forks source link

AutoEncoder for MNIST Issue when reshaping #12062

Closed Atralb closed 5 years ago

Atralb commented 5 years ago

Hi all,

First of all I must say I'm very new to Tensorflow, Keras and Deep Learning as a whole.

I am trying to create a very simple autoencoder on my own with keras Sequential model on MNIST dataset, as follows :

import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, MaxPooling2D, Flatten, Reshape
import matplotlib.pyplot as plt

from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

model3 = Sequential()

model3.add(Flatten(input_shape=x_train.shape[1:]))
model3.add(Dense(300, activation='relu'))
model3.add(Dense(150, activation='relu'))
model3.add(Dense(300, activation='relu'))
model3.add(Dense(784, activation='relu'))
model3.add(Reshape((28,28)))

model3.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

model3.summary()

model3.fit(x_train,x_train, epochs=10, batch_size=50)

This is the error message I got and I can't figure out where does the error come from. I imagine it's from the reshape layer at the end but I'm not completely sure, and even so I don't know why it's wrong. I already tried with model3.add(Reshape((28,28,1))) without success


ValueError                                Traceback (most recent call last)

<ipython-input-33-f06ec23d8fdd> in <module>()
----> 1 model3.fit(x_train,x_train, epochs=10, batch_size=50)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, max_queue_size, workers, use_multiprocessing, **kwargs)
   1534         steps_name='steps_per_epoch',
   1535         steps=steps_per_epoch,
-> 1536         validation_split=validation_split)
   1537 
   1538     # Prepare validation data.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, batch_size, check_steps, steps_name, steps, validation_split)
    990         x, y, sample_weight = next_element
    991     x, y, sample_weights = self._standardize_weights(x, y, sample_weight,
--> 992                                                      class_weight, batch_size)
    993     return x, y, sample_weights
    994 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in _standardize_weights(self, x, y, sample_weight, class_weight, batch_size)
   1152           feed_output_shapes,
   1153           check_batch_axis=False,  # Don't enforce the batch size.
-> 1154           exception_prefix='target')
   1155 
   1156       # Generate sample-wise weight values given the `sample_weight` and

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training_utils.py in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    330                 'Error when checking ' + exception_prefix + ': expected ' +
    331                 names[i] + ' to have shape ' + str(shape) +
--> 332                 ' but got array with shape ' + str(data_shape))
    333   return data
    334 

ValueError: Error when checking target: expected reshape_7 to have shape (28, 1) but got array with shape (28, 28)

Thanks for your help :)

msymp commented 5 years ago

@Atralb , It is clearly expecting a shape of rank 1 in the training stage, from your definition of the training shape in the initial Flatten layer. Can you try to see if it works with a rank 1 output layer (in the Reshape). Thanks.

Atralb commented 5 years ago

@msymp Thanks for your answer. However, I don't understand what you mean by rank 1 shape. Do you mean a tensor with only 1 dimension ? If it is that, I don't see why, the x_train is of shape (60000,28,28) so x_train[1:] is of shape (28,28).

msymp commented 5 years ago

Hi @ParikhKadam , can you assist @Atralb with the shape error that occurs at the training stage in the above code. Thanks.

ParikhKadam commented 5 years ago

@Atralb @msymp

Will explain my answer later but for now, try this code:

import tensorflow as tf
import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Dropout, MaxPooling2D, Flatten, Reshape
import matplotlib.pyplot as plt

from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train/256.0
x_train = np.expand_dims(x_train, axis=-1)

model3 = Sequential()

model3.add(Flatten(input_shape=(28,28,1)))
model3.add(Dense(300, activation='relu'))
model3.add(Dense(150, activation='relu'))
model3.add(Dense(300, activation='relu'))
model3.add(Dense(784, activation='relu'))
model3.add(Reshape((28,28,1)))

model3.compile(optimizer='adam', loss='sparse_categorical_crossentropy')

model3.summary()

model3.fit(x_train,x_train, epochs=10, batch_size=50)

It will definitely work.. Time to sleep. Ttyl..

Atralb commented 5 years ago

@msymp @ParikhKadam Thanks a lot for your help guys. Looking forward to your answer tomorrow Parikh :). I am kind of seeing the issue now but still not clearly understanding it.

This however raised two other issues for me :

PS : this is one of the results after 10 epochs. The sample seems to have been made noisy at the beginning of the architecture, maybe during the flatten layer? (if somebody still wants to see it ^^ https://user-images.githubusercontent.com/35039163/51411496-5ea60c80-1b68-11e9-92f8-2c17bc831162.png)

EDIT : Oh god I'm dumb. I still treated it as a categorical problem with the loss function... My bad. It worked when I changed it to MSE

ParikhKadam commented 5 years ago

@Atralb @msymp

(if somebody still wants to see it ^^ https://user-images.githubusercontent.com/35039163/51411496-5ea60c80-1b68-11e9-92f8-2c17bc831162.png

Had a look at it..you forgot to denormalize the predicted values. It's no problem if you change the loss function but to make the model work with this loss function, you should denormalize the predicted values by multiplying the predicted output tensor by 256. And then plot the image...

Now here comes the explanation. I made two changes to your training data:

  1. x_train = x_train/256.0 -- I was forced to normalize the values as the loss function you mentioned in your code accepted values in range on [0,1) only. Take a look another time at the range -- 1 is not included.

The additional normalization (division by 256) you made, which I thought was only meant for efficiency purposes, but I noticed it is a requirement for it to run otherwise I get an error of values out of range [0,1]. Why is it so ?

I gave answer to this above in the first point.. But I look like to explain more on this. Normalization isn't done this way (dividing by 256). It is actually done by subtracting the mean and diving the result by standard deviation. But in case of images such as MNIST, the pixels holds values from 0 to 255. Hence, we should actually divide the input tensor by 255 (for simplest form of normalization). But I divided it by 256. Reason is the range [0,1) where 1 isn't included and dividing the highest pixel value in the data (255) by 255 will give you 1.

  1. x_train = np.expand_dims(x_train, axis=-1) -- Note that you used Flatten layer and flatten layer is specially made for flattening the images in Keras. Flatten layer has a special argument which describes the format of images which are two: channels_first & channels_last. By default, it accepts the argument as channels_last if not specified when instantiating the Flatten layer.

Now, MNIST are black and white images. Hence, the number of channels is 1. A single channel which describes the color in image: 0 for white, 128 for grey and 255 for black. Your input was of shape (28, 28) and it didn't contain channel related information. As black and white images contain a single channel and the flatten layer takes channels_last by default, we modified the input to have shape (28,28,1).

The shape of input images is generally (height, width, no_of_channels) or (no_of_channels, height, width)

That's all for the explanation.

Now, I saw that you modified the loss function to MSE. Crosscheck if MSE needs the input values in range [0,1). If not, you can remove the normalization step..

Thank you.. Keep me assigning issues and I will solve them in my free time..

Learn from others' mistakes :)

Atralb commented 5 years ago

@ParikhKadam For the discussion about the loss function, I am sorry but you are wrong on that. Denormalizing the values doesn't change anything, and you could already see in the noisy image. There's no way it would have been like that if the original image was a number. You would have seen the pattern. And as expected, multiplying by 256.0 doesn't change anything to the noisiness of the image, and no information can be extracted from it. The results with this loss function are simply wrong. And as I said, the training starts at the very beginning with a loss 0. This right away proves there's a problem and the predictions are wrong.

Thanks for the channel explanation though, I better understand now. And for your overall help :)

ParikhKadam commented 5 years ago

@Atralb Thank you for notifying.. I understood that we can't train this model on that loss function. I will still try that myself.

For now, is the model running perfectly with the new changes applied? I haven't tried running this code so just asked for confirmation.

Thank you..

Atralb commented 5 years ago

Yep with an MSE loss function, it works perfectly and makes a very easy and simple autoencoder for my data ! (I guess this should be marked as solved ?)

ParikhKadam commented 5 years ago

@Atralb Ohk.. Yep..mark it as solved and close this issue.

msymp commented 5 years ago

This issue is closed. Thanks @ParikhKadam and @Atralb , this discussion was very clarifying.

ParikhKadam commented 5 years ago

Welcome.. @msymp