keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.6k stars 19.41k forks source link

Replicating Generating Sequences by Alex Graves Handwriting Section #1608

Closed dragon271828 closed 7 years ago

dragon271828 commented 8 years ago

I'm trying to replicate Alex Graves' paper: http://arxiv.org/pdf/1308.0850v5.pdf

The part with handwriting generation, I'm having trouble defining the objective function as a function of y_true and y_pred. In the paper, y_true takes the form of a 3-tuple and y_pred takes the form of a (e, {w_i, mu_i, sigma_i, rho_i}) where w_i, mu_i, sigma_i, and rho_i are the parameterization of the Gaussian mixture and e_i is the probability of whether the pen is down or not.

First off, y_true and y_pred have different dimensions, is that allowed?

Secondly, the different elements of y_pred must be treated individually in the custom loss function. So for instance let's say that there's two Gaussians in the mixture so that the dimension of y_pred is 9. Then the loss function will use all these individual components and do something different with each of them as shown on page 20 of the reference paper above

e = y_pred[0] w1 = y_pred[1] mu1 = y_pred[2] sigma1 = y_pred[3] rho1 = y_pred[4] w2 = y_pred[5] mu2 = y_pred[6] sigma2 = y_pred[7] rho2 = y_pred[8]

Is splitting up the components of y_pred a permitted operation in the custom loss function? I've written an implementation of the function but seem to be getting NaNs for the loss. I'm not sure whether I am doing something wrong or whether they are simply not allowed in keras or theano.

rpinsler commented 8 years ago

I've done something similar in #1061. Does that help?

dragon271828 commented 8 years ago

Thanks! This is incredibly helpful. I implemented a layer similar to yours, however I'm now receiving NaNs in the training loss after a couple iterations. I noticed that you had the same issue, would you mind sharing what kind of numeric optimization problems you had and how you solved them?

rpinsler commented 8 years ago

I can't remember exactly what the source of the problem was. I tried different things to avoid those numeric problems, e.g. another optimizer, gradient clipping and batch normalization. Now, it is pretty stable. Let me know if you can get it to work!

jrieke commented 8 years ago

I ran into the same issue, a few notes that might be helpful: I fixed it eventually by choosing a way smaller learning rate (with RMSprop; it's still quite prone to errors though so don't expect too much..). Gradient clipping had no effect for me (only made things worse); didn't try batch normalization. I also found that the parameter M (i. e. the number of Gaussian distributions per mixture) has quite an effect on the nan issue, so try to play around with that as well.

el3ment commented 8 years ago

I'm also getting nans after a few hundred iterations.

This implementation in tensorflow: http://blog.otoro.net/2015/11/24/mixture-density-networks-with-tensorflow/

Didn't seem to have the same nan issue, but in trying to debug the objective function in @rpinsler 's implementation I can't see any reason what is causing the nan unless one of the sigmas strays too close to zero and causes a divide by zero or perhaps the "sum" in return (alpha/sigma) * T.exp(-T.sum(T.sqr(mu-y_true),-1)/(2*sigma**2)) is misplaced. Did either of you solve the problems?

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.