keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.98k stars 19.46k forks source link

What are some pressing Keras bugs that I am not aware of? #4996

Closed fchollet closed 6 years ago

fchollet commented 7 years ago

We will soon be cutting a final Keras v1 in preparation for upcoming development on Keras v2.

What are some pressing issues that you want to see addressed in Keras v1 before we freeze it? Anything I have been missing? No feature requests, only bugs.

patyork commented 7 years ago

[EDIT:DONE]

Not pressing, I guess, but the (non-)cropping issue (PR #4941) would be nice to either fix, or at very least fix the documentation/code that defaults to a non-working crop of ((0,0) (0,0)).

The issue applies to Cropping2D and 3D; 1D works fine.

patyork commented 7 years ago

[EDIT: DONE]

The only other thing I can think of is the DeConvolution layers, specifically in Theano. 'tf' ordering does not work as expected, but I have not drilled into it.

Some example code (I haven't opened an issue or anything yet. Trying to understand DeConv fully first):

from keras.models import Sequential
from keras.layers import ZeroPadding2D, Deconvolution2D
from keras import callbacks
import numpy as np

# This gives an unexpected output shape
model = Sequential()
model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 14, 14, 3), border_mode='valid', dim_ordering='tf', input_shape=(12, 12, 3)))

model.compile(loss='mse', optimizer='adam')

print 'Actual Output Shape: ', model.predict(np.ones((1, 12, 12, 3))).shape
# >> Actual Output Shape:  (1, 14, 3, 3)
# This makes it seem like the image was 12x3 with 12 channels, or something

# This works, and gives reasonable output
model = Sequential()
model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 3, 14, 14), border_mode='valid', dim_ordering='th', input_shape=(3, 12, 12)))

model.compile(loss='mse', optimizer='adam')

print 'Actual Output Shape: ', model.predict(np.ones((1, 3, 12, 12))).shape
# >> Actual Output Shape:  (1, 3, 14, 14)
# This is as expected, and matches the doc
jmhessel commented 7 years ago

[EDIT: DONE]

Not sure of your definition of pressing, but --

I think there might be an issue with saving deconv layers with the tf backend.

from keras.models import Sequential, load_model
from keras.layers import Deconvolution2D
model = Sequential()
model.add(Deconvolution2D(3, 3, 3, output_shape=(None, 14, 14, 3), input_shape=(12, 12, 3)))
model.save('tmp.model')
del model
model = load_model('tmp.model')

gives an error for me when I'm using the tf backend.

I added a tuple cast that seems to fix it for me in #4999

(I originally had a second bug here, but it seems to have been fixed!)

patyork commented 7 years ago

I've made 2 PRs: one to fix a bug in DeConv2D, and then a general bug that I found relating to initializations (generally with convolutoinal layers); see #5000 and #5002

iammarvelous commented 7 years ago

[EDIT: FIXED IN RECENT VERSIONS OF TENSORFLOW]

As mentioned in #4991, there seems to be a consistency issue for keras backend. It seems that keras just assume basic operators (__add__) among tensors have similar broadcasting behavior in both theano and tensorflow. But I have come across a case earlier that the behaviors of those two backends on basic operators are not quite same. Here is an example, when a is a matrix and b is a vector. Theano backend would broadcast but tensorflow would not do that. Right now, I use the following way to work around.

def k_div(x, y):
    try:
        return x/y
    except:
        import tensorflow as tf
        return tf.div(x, y)

Since I didn't find any operator overload in source code, I think it might be better to check the consistency and overload operators for keras's own tensor class. Correct me if I misunderstood anything. Thanks.

patyork commented 7 years ago

@iammarvelous The below code works on both Theano 0.8.2 and Tensorflow r0.11, python 2.7. Both Theano and TensorFlow (AFAIK) follow the SciPy broadcasting guidelines.

import keras.backend as K
import numpy as np
def division_ex(mat, vec):
    return mat/vec

mat = K.placeholder(shape=(28, 28))
vec = K.placeholder(shape=(28,))
division_fn = division_ex(mat, vec)
fn = K.function(inputs=[mat, vec], outputs=[division_fn])

# 8.0 / 4.0 == 2.0 ??
print K.backend(), np.allclose(fn([np.ones((28,28))*8., np.ones((28,))*4]), np.ones((28,28))*2.0)
# >> theano True
# <switch backends>
# >> tensorflow True
siddheshk commented 7 years ago

[EDIT: FIXED A LONG TIME AGO]

As mentioned in #4954, the TimeDistributed wrapper does not work with Recurrent Layers in the 1.2.0 version. The same code worked on a previous version of Keras (1.1.2).

I get the following error on Keras 1.2.0

File "/home/##/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 569, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/home/##/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 632, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/home/##/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 164, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/home/##/.local/lib/python2.7/site-packages/keras/layers/wrappers.py", line 129, in call
    y = self.layer.call(X)  # (nb_samples * timesteps, ...)
File "/home/##/.local/lib/python2.7/site-packages/keras/layers/recurrent.py", line 201, in call
    input_shape = K.int_shape(x)
File "/home/##/.local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 128, in int_shape
    raise Exception('Not a Keras tensor:', x)
Exception: ('Not a Keras tensor:', Reshape{3}.0)

The code snipped is written below


input_layer = Input(shape=(document_size, img_h,), dtype='int32', name='input_layer')
embedding_layer = TimeDistributed(Embedding(len(W2V), img_w, input_length=img_h, weights=[W2V], trainable=True, mask_zero=True))(input_layer)
gru_word = TimeDistributed(GRU(GRU_layers[0], return_sequences=True, activation=conv_non_linear))(embedding_layer)
keunwoochoi commented 7 years ago

Can we now have many BN layers with mode=0?

jmhessel commented 7 years ago

@keunwoochoi What do you mean many BN layers? I think you can now have as many shared/unshared/whatever mode=0 BN layers as you want.

patyork commented 7 years ago

BatchNorm layers - unless that's not what you meant, haha.

keunwoochoi commented 7 years ago

Related issues are #3903 #4334 #2954.

fchollet commented 7 years ago

@keunwoochoi not sure the specifics of what you mean, but the issues you are referring to have been fixed a long time ago.

keunwoochoi commented 7 years ago

@fchollet I see, thanks.

mlzxy commented 7 years ago

@patyork @iammarvelous

For the divsion example, tensorflow will fail if you change the mat,vec shape from [28, 28]/[28] to [7, 28]/[7], tensorflow will raise exception while theano goes fine.

import os 
# os.environ['KERAS_BACKEND'] = 'theano'

import keras.backend as K
import numpy as np
def division_ex(mat, vec):
    return mat/vec

mat = K.placeholder(shape=(7, 28))
vec = K.placeholder(shape=(7,))
division_fn = division_ex(mat, vec)
fn = K.function(inputs=[mat, vec], outputs=[division_fn])

# 8.0 / 4.0 == 2.0 ??
print K.backend(), np.allclose(fn([np.ones((28,28))*8., np.ones((28,))*4]), np.ones((28,28))*2.0)

Theano 0.8.2, tensorflow 0.12.0-rc-1

billshoo commented 7 years ago

The following Keras model runs smoothly under tensorflow, but gives "theano.gof.fg.MissingInputError" on the last line under theano (X and y can be any dummy data)

X = np.random.rand(100000, 320)
y = np.random.rand(100000, 1)
model = Sequential()
model.add(TimeDistributed(Dropout(0.5), batch_input_shape=(1, 100000, 320)))
model.add(TimeDistributed(Dense(1000)))
model.add(TimeDistributed((Dropout(0.5))))
model.add(LSTM(1000, batch_input_shape=(1, 100000, 1000), 
               return_sequences=True, stateful=True))
model.add(TimeDistributed((Dropout(0.5))))
model.add(TimeDistributed(Dense(1)))
opti = RMSprop(lr=0.001)
model.compile(optimizer=opti, loss='mse')
model.train_on_batch(X[np.newaxis, 0:100000, :], y[np.newaxis, 0:100000])
honnibal commented 7 years ago

The average pool layer doesn't support masking. The zero padding then throws off the calculation of the average. I think you just need to set the flag.

(This is out of thread scope and probably discussed elsewhere. But: I think the sequence handling in Keras should probably be redesigned. The current approach loses information: after the padding you don't know the true sequence boundaries, so you have to guess from the pad value. This is problematic as soon as you embed. I've found it quite difficult to implement pooling and attention models correctly. I think a design that maintains an array of the original sequence lengths might work better.)

ncullen93 commented 7 years ago

Do the Deconvolution2D layers still require output_shape ? That's a common source of issues for me and is hard to work with.. Especially because you have to explicitly pass in the batch size to the output_shape instead of using "-1" (as in numpy or tensorflow) or just not requiring the batch size at all.

Example:

Required: deconv1 = Deconvolution2D(10,5,5,output_shape=[batch_size]+conv1.get_shape().as_list()[1:])(conv1)

Instead of: deconv1 = Deconvolution2D(10,5,5,output_shape=[-1]+conv1.get_shape().as_list()[1:])(conv1)

or not having that at all. Definitely not a bug, but a common source of user issues.

ncullen93 commented 7 years ago

Referring to my previous comment, I see you can use 'None' instead of -1 or batch_size.. awesome. However, there is a common bug that occurs in the Deconvolution2D class when you use None, which originates from tf.nn.deconv2d but can easily be alleviated in keras: "TypeError: Expected binary or unicode string, got None"

The fix is to add tf.stack() in tensorflow_backend.py on line 2472:

def _preprocess_deconv_output_shape(x, shape, dim_ordering):
    if dim_ordering == 'th':
        shape = (shape[0], shape[2], shape[3], shape[1])

    if shape[0] is None:
        shape = (tf.shape(x)[0], ) + tuple(shape[1:])
        shape = tf.stack(list(shape))
    return shape

Example issue from TF that carries over to keras: https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/vf8eH9YMwVA

tiberiu44 commented 7 years ago

Hi all,

I am I don't know if this thread is still open, but I have been experiencing some issues with bidirectional LSTMs and the functional API and this seems like the right place to get feedback.

I don't know if I got this right, but I think that:

bdlstm=Sequential()
bdlstm.add(Bidirectional(LSTM(32, return_sequences=True), input_shape=(None, 128)))
layer_3_merge_x = bdlstm(layer_1)

and

layer_2_lstm_forward = LSTM(32, return_sequences=True)(layer_1)
layer_2_lstm_backward = LSTM(32, return_sequences=True, go_backwards=True)(layer_1)
layer_3_merge_x = concatenate([layer_2_lstm_forward, layer_2_lstm_backward])

should behave identically. I tried multiple ways of merging the two layers (concatenating/summing etc.), but still no luck.

However, this only seems to happen if the number of timesteps is fixed for the input data. Whenever I used non-fixed inputs it seemed like the second layer (go_backwards=True) was not working properly.

Hope I got this right and many thanks for the developers of Keras (really great framework).

Best, Tibi

gabrieldemarmiesse commented 6 years ago

@tiberiu44 please open a new issue for this if necessary.

Closing this issue as keras 2.0 has been released quite some time ago.