keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.98k stars 19.47k forks source link

variational auto-encoder input dimension mis-match #3332

Closed aironashish closed 8 years ago

aironashish commented 8 years ago

Hi,

I am using the variational auto encoder described in http://blog.keras.io/building-autoencoders-in-keras.html.

My input shape is (22664, 678) and I have changed loss to "categorical_crossentropy". I am not using the customized loss function(I thought that me be causing the problem). And following are parameters values

batch_size = 64 original_dim = 678 latent_dim = 2 intermediate_dim = 45 nb_epoch = 5

My input contains value ranging from 0.5 to 21(most of them are 0s).

The program is giving the following error

Input dimension mis-match. (input[0].shape[0] = 8, input[1].shape[0] = 64) Apply node that caused the error: Elemwise{mul,no_inplace}(Elemwise{Composite{exp((i0 * (i1 + i2)))}}[(0, 1)].0, Reshape{2}.0) Toposort index: 47 Inputs types: [TensorType(float32, matrix), TensorType(float32, matrix)] Inputs shapes: [(8, 2), (64, 2)] Inputs strides: [(8, 4), (8, 4)] Inputs values: ['not shown', 'not shown'] Outputs clients: [[Gemm{inplace}(Elemwise{mul,no_inplace}.0, TensorConstant{1.0},Elemwise{Composite{(i0 * (Abs(i1) + i2 + i3))}}[(0, 2)].0, dense_47_W, TensorConstant{1.0})]] HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Can someone please explain the reasons behind this?

EderSantana commented 8 years ago

you defined your batch size as 64 but at one point passes only 8 samples as input. please check where you may be doing that by accident. note that if you are using theano backend you may use "undefined" batch size and avoid that problem altogether

aironashish commented 8 years ago

I checked, there is no way 8 samples are set as input. You can see the code below

batch_size = 8
original_dim = 678
latent_dim = 2
intermediate_dim = 45
nb_epoch = 5

x = Input(batch_shape=(batch_size, original_dim))
h = Dense(intermediate_dim, activation='relu')(x)
z_mean = Dense(latent_dim)(h)
z_log_var = Dense(latent_dim)(h)

def sampling(args):
    z_mean, z_log_var = args
    epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0.)
    return z_mean + K.exp(z_log_var / 2) * epsilon

z = Lambda(sampling, output_shape=(latent_dim,))([z_mean, z_log_var])

decoder_h = Dense(intermediate_dim, activation='relu')
decoder_mean = Dense(original_dim, activation='softmax')
h_decoded = decoder_h(z)
x_decoded_mean = decoder_mean(h_decoded)

def vae_loss(x, x_decoded_mean):
    xent_loss = objectives.categorical_crossentropy(x, x_decoded_mean)
    kl_loss = - 0.5 * K.sum(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var), axis=-1)
    return xent_loss + kl_loss

vae = Model(input=x, output=x_decoded_mean)

vae.compile(optimizer='adam', loss="categorical_crossentropy")
vae.fit(x_train, x_train,
        shuffle=False,
        nb_epoch=nb_epoch,
        batch_size=batch_size,
        validation_data=(x_test, x_test))

The other problem is if I set the batch size to 8 it give the following error

Input dimension mis-match. (input[1].shape[0] = 4, input[3].shape[0] = 8)
Apply node that caused the error: Elemwise{Composite{(exp((i0 * (i1 + i2))) * i3)}}[(0, 1)](TensorConstant{(1, 1) of 0.5}, Dot22.0, InplaceDimShuffle{x,0}.0, Reshape{2}.0)
Toposort index: 20
Inputs types: [TensorType(float32, (True, True)), TensorType(float32, matrix), TensorType(float32, row), TensorType(float32, matrix)]
Inputs shapes: [(1, 1), (4, 2), (1, 2), (8, 2)]
Inputs strides: [(4, 4), (8, 4), (8, 4), (8, 4)]
Inputs values: [array([[ 0.5]], dtype=float32), 'not shown', array([[-0.13515097,  0.00348977]], dtype=float32), 'not shown']
Outputs clients: [[Gemm{inplace}(Elemwise{Composite{(exp((i0 * (i1 + i2))) * i3)}}[(0, 1)].0, TensorConstant{1.0}, Elemwise{Composite{(i0 * (Abs((i1 + i2)) + i1 + i2))}}[(0, 1)].0, dense_12_W, TensorConstant{1.0})]]
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
EderSantana commented 8 years ago

how many samples you have in your dataset? is it a multiple of 8. This line is what worries me: epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0.) but I think keras forces the same number of samples in the batch during training, so that should not be hapenning during training time.

what is the size of the test set?

aironashish commented 8 years ago

Train dataset shape : (22664, 678) Test dataset shape : (1972, 678)

If I run on a GPU, this is the error

ValueError: GpuElemwise. Input dimension mis-match. Input 3 (indices start at 0) has shape[0] == 8, but the output's size on that axis is 4.
Apply node that caused the error: GpuElemwise{Composite{(exp((i0 * (i1 + i2))) * i3)}}[(0, 1)](CudaNdarrayConstant{[[ 0.5]]}, GpuDot22.0, GpuDimShuffle{x,0}.0, GpuReshape{2}.0)
Toposort index: 25
Inputs types: [CudaNdarrayType(float32, (True, True)), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, row), CudaNdarrayType(float32, matrix)]
Inputs shapes: [(1, 1), (4, 2), (1, 2), (8, 2)]
Inputs strides: [(0, 0), (2, 1), (0, 1), (2, 1)]
Inputs values: [CudaNdarray([[ 0.5]]), 'not shown', CudaNdarray([[-0.08925677 -0.15396592]]), 'not shown']
Outputs clients: [[GpuGemm{inplace}(GpuElemwise{Composite{(exp((i0 * (i1 + i2))) * i3)}}[(0, 1)].0, TensorConstant{1.0}, GpuElemwise{Composite{(i0 * ((i1 + i2) + Abs((i1 + i2))))}}[(0, 1)].0, dense_30_W, TensorConstant{1.0})]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
EderSantana commented 8 years ago

just to test my hypothesis, could you please run the experiment with batch_size=4? 1972 is not divisible by 8

aironashish commented 8 years ago

It works with batch size = 4. So divisibility is the only problem ? But, I never had a such problem in Keras !

EderSantana commented 8 years ago

Its because of your epsilon = K.random_normal(shape=(batch_size, latent_dim), mean=0.) in the sampling layer. epsilon ALWAYS have batch_size rows. A better way should have different behavior for test time where no sampling is used and only the mean is forward. Using Lambda is a quick hack but forces you to make batch_size well defined.

aironashish commented 8 years ago

Thanks a lot !