Closed MarkusLund closed 7 years ago
Hello,
Do you really intend to stop the gradient from flowing through your layer?
Have you tried using K.stop_gradient to explicitly state this?
Im trying to use this as a generator in a GAN(Generative adversarial network. In my GAN the discriminator is trained with sequences of one hot vectors, and I therefor need the generator to also output onehot vectors. Tried without onehot, with only one softmax activation as last layer. This made the discriminator learn to recongnize onehot vectors as real data, while vectors from generator were too unsimilar real examples.
GAN usually doesn't mix well with discrete values. Because they prevent the gradient from flowing through them.
The "clean" approach is to use reinforcement learning (i.e REINFORCE algorithm) as done in SeqGAN.
The "may work approach depending on your problem" is skipping the one-hot and directly learning the embedding which are continuous. But it usually doesn't work when you have too many labels.
You can also try to apply softmax multiple times : i.e K.softmax( constant K.softmax( out ) ) And you can also add noise (add a random uniform times a constant and take a softmax i.e K.softmax( c (true+unif) ) ) to the true one hot input so as to make the discriminator job not too easy. It should work in principle but you will have to fiddle with constants to get it to work.
Thank you for good advice, much appreciated!
However, I do not understand why my initial though to convert the softmax output into a one-hot representation stops the gradient.
This code compiles and runs with Theano as backend but not with Tensorflow, and my GPU server requires Tensorflow.
The problem comes when you want to compute a gradient of a boolean tensor.
Theano is less strict, and pass silently a zero for the grad (which you probably don't expect). Tensorflow is more strict.
x=fmatrix()
res = softmax_to_onehot(x)
gr = theano.grad( T.sum(res),[x])
fun=theano.function([x],[res]+gr)
fun( np.array([[0.3,0.7],[0.5,0.5]]).astype("float32") )
Out:
[array([[ 0., 1.],
[ 1., 1.]], dtype=float32), array([[ 0., 0.],
[ 0., 0.]], dtype=float32)]
In addition to the previous comment. What is sometime useful, is multiplying the softmax_to_onehot(x) by x. There you get a filter which only let through the x of max indice, and for which the gradients are somehow expected. (But mathematically they are still off the correct calculation which is done by "REINFORCE" aka policy gradients).
x=fmatrix() res = x * softmax_to_onehot(x) # ( you can use K.stop_gradient on softmax_to_onehot(x) if tensorflow raise an issue) gr = theano.grad( T.sum(res),[x]) fun=theano.function([x],[res]+gr)
fun( np.array([[0.3,0.7],[0.5,0.5]]).astype("float32") ) Out: [array([[ 0. , 0.69999999], [ 0.5 , 0.5 ]], dtype=float32), array([[ 0., 1.], [ 1., 1.]], dtype=float32)]
Thank you so much for taking the time explaining this to me.
So if I understand this correctly, my main issue is that when ending a model with a softmax activation the model has the ability to understand the degree of error. While converting the output into a one-hot vector converts the output to a discrete space, removing the ability to understand the degree of error, and rather just receiving feedback on if it was wrong or right. And thus preventing it from receiving a proper gradient.
Yes exactly.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
As seen in the code snippet below I have a network which ends with an Lambda layer. This layer use the softmax_to_onehot() (also in snippet) to convert from a softmax-represenation of a vector to a one-hot represenation eg.
[0.1, 0.5, 0.24] -> [0, 1, 0]
However when using Tensorflow as backend this gives the following error message:
This error does not however occur when I use Theano as backend, but I need to use Tenorflow.