keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.15k stars 19.49k forks source link

Lambda layer and Tensorflow backend gives ValueError #5537

Closed MarkusLund closed 7 years ago

MarkusLund commented 7 years ago

As seen in the code snippet below I have a network which ends with an Lambda layer. This layer use the softmax_to_onehot() (also in snippet) to convert from a softmax-represenation of a vector to a one-hot represenation eg. [0.1, 0.5, 0.24] -> [0, 1, 0]

def model():
    model = Sequential()
    model.add(LSTM(512, input_shape=(MAX_SEQUENCE_LENGTH, NOISE_SIZE), return_sequences=True))
    model.add(Dropout(0.2))
    model.add(TimeDistributed(Dense(NB_WORDS, activation="softmax")))
    model.add(Lambda(soft_to_one, output_shape=output_shape_lambda))
    return model
def softmax_to_onehot(t):
    k_max = K.max(t, keepdims=True, axis=2)
    equal = K.equal(t, k_max)
    return K.cast(equal, 'float32')

However when using Tensorflow as backend this gives the following error message:

   File "LSTM.py", line 254, in test
    loss = model.train_on_batch(g_input_noise_batch, one_hot_caption_batch)
  File "/.virtualenvs/keras/lib/python2.7/site-packages/keras/models.py", line 766, in train_on_batch
    class_weight=class_weight)
  File "/.virtualenvs/keras/lib/python2.7/site-packages/keras/engine/training.py", line 1319, in train_on_batch
    self._make_train_function()
  File "/.virtualenvs/keras/lib/python2.7/site-packages/keras/engine/training.py", line 760, in _make_train_function
    self.total_loss)
  File "/.virtualenvs/keras/lib/python2.7/site-packages/keras/optimizers.py", line 433, in get_updates
    m_t = (self.beta_1 * m) + (1. - self.beta_1) * g
  File "/.virtualenvs/keras/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 883, in binary_op_wrapper
    y = ops.convert_to_tensor(y, dtype=x.dtype.base_dtype, name="y")
  File "/.virtualenvs/keras/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 651, in convert_to_tensor
    as_ref=False)
  File "/.virtualenvs/keras/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 716, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/.virtualenvs/keras/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 176, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/.virtualenvs/keras/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 165, in constant
    tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
  File "/.virtualenvs/keras/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 360, in make_tensor_proto
    raise ValueError("None values not supported.")
ValueError: None values not supported.

This error does not however occur when I use Theano as backend, but I need to use Tenorflow.

unrealwill commented 7 years ago

Hello,

Do you really intend to stop the gradient from flowing through your layer?

Have you tried using K.stop_gradient to explicitly state this?

MarkusLund commented 7 years ago

Im trying to use this as a generator in a GAN(Generative adversarial network. In my GAN the discriminator is trained with sequences of one hot vectors, and I therefor need the generator to also output onehot vectors. Tried without onehot, with only one softmax activation as last layer. This made the discriminator learn to recongnize onehot vectors as real data, while vectors from generator were too unsimilar real examples.

unrealwill commented 7 years ago

GAN usually doesn't mix well with discrete values. Because they prevent the gradient from flowing through them.

The "clean" approach is to use reinforcement learning (i.e REINFORCE algorithm) as done in SeqGAN.

The "may work approach depending on your problem" is skipping the one-hot and directly learning the embedding which are continuous. But it usually doesn't work when you have too many labels.

You can also try to apply softmax multiple times : i.e K.softmax( constant K.softmax( out ) ) And you can also add noise (add a random uniform times a constant and take a softmax i.e K.softmax( c (true+unif) ) ) to the true one hot input so as to make the discriminator job not too easy. It should work in principle but you will have to fiddle with constants to get it to work.

MarkusLund commented 7 years ago

Thank you for good advice, much appreciated!

However, I do not understand why my initial though to convert the softmax output into a one-hot representation stops the gradient.

This code compiles and runs with Theano as backend but not with Tensorflow, and my GPU server requires Tensorflow.

unrealwill commented 7 years ago

The problem comes when you want to compute a gradient of a boolean tensor.

Theano is less strict, and pass silently a zero for the grad (which you probably don't expect). Tensorflow is more strict.

x=fmatrix()
res = softmax_to_onehot(x)
gr = theano.grad( T.sum(res),[x])
fun=theano.function([x],[res]+gr)

fun( np.array([[0.3,0.7],[0.5,0.5]]).astype("float32") )
Out: 
[array([[ 0.,  1.],
        [ 1.,  1.]], dtype=float32), array([[ 0.,  0.],
        [ 0.,  0.]], dtype=float32)]
unrealwill commented 7 years ago

In addition to the previous comment. What is sometime useful, is multiplying the softmax_to_onehot(x) by x. There you get a filter which only let through the x of max indice, and for which the gradients are somehow expected. (But mathematically they are still off the correct calculation which is done by "REINFORCE" aka policy gradients).

x=fmatrix() res = x * softmax_to_onehot(x) # ( you can use K.stop_gradient on softmax_to_onehot(x) if tensorflow raise an issue) gr = theano.grad( T.sum(res),[x]) fun=theano.function([x],[res]+gr)

fun( np.array([[0.3,0.7],[0.5,0.5]]).astype("float32") ) Out: [array([[ 0. , 0.69999999], [ 0.5 , 0.5 ]], dtype=float32), array([[ 0., 1.], [ 1., 1.]], dtype=float32)]

MarkusLund commented 7 years ago

Thank you so much for taking the time explaining this to me.

So if I understand this correctly, my main issue is that when ending a model with a softmax activation the model has the ability to understand the degree of error. While converting the output into a one-hot vector converts the output to a discrete space, removing the ability to understand the degree of error, and rather just receiving feedback on if it was wrong or right. And thus preventing it from receiving a proper gradient.

unrealwill commented 7 years ago

Yes exactly.

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.