Closed zhujiangang closed 8 years ago
Yes, categorical cross-entropy seems better.
Another question I'd like to discuss with you that, do you find your lstm architecture is slightly different from what mentioned in Colah’s blog? Should the values of forget gate and input gate be connected with the cell state of last step? Or this difference won't affect the performance of lstm? I also refer to lstm implementation in theano tutorial. It gives the same math formulas as Colah's.
Thanks!
@zhujiangang
Actually, the structure in my code is a common architecture. You can find it in Colah's blog "One popular LSTM variant, introduced by Gers & Schmidhuber (2000), is adding “peephole connections.” This means that we let the gate layers look at the cell state."
I see. One of my teammates shared this variant just now (just 10 minutes ago). What a coincidence! Thank you! ps. I am your fan in Weibo.
Hi, I have one question after reading your rnn code. Since you use softmax as activation function, why do you use binary cross entropy as loss function (- t * log(y) - (1 - t) * log(1 - y)). I also find a post on reddit about this issue. https://www.reddit.com/r/MachineLearning/comments/39bo7k/can_softmax_be_used_with_cross_entropy/ Could you please explain this to me? Thanks!