lipiji / JRNN

LSTM and GRU in JAVA
MIT License
113 stars 78 forks source link

Binary cross-entropy or categorical cross-entropy? #1

Closed zhujiangang closed 8 years ago

zhujiangang commented 8 years ago

Hi, I have one question after reading your rnn code. Since you use softmax as activation function, why do you use binary cross entropy as loss function (- t * log(y) - (1 - t) * log(1 - y)). I also find a post on reddit about this issue. https://www.reddit.com/r/MachineLearning/comments/39bo7k/can_softmax_be_used_with_cross_entropy/ Could you please explain this to me? Thanks!

lipiji commented 8 years ago

Yes, categorical cross-entropy seems better.

zhujiangang commented 8 years ago

Another question I'd like to discuss with you that, do you find your lstm architecture is slightly different from what mentioned in Colah’s blog? Should the values of forget gate and input gate be connected with the cell state of last step? Or this difference won't affect the performance of lstm? I also refer to lstm implementation in theano tutorial. It gives the same math formulas as Colah's.

Thanks!

lipiji commented 8 years ago

@zhujiangang

Actually, the structure in my code is a common architecture. You can find it in Colah's blog "One popular LSTM variant, introduced by Gers & Schmidhuber (2000), is adding “peephole connections.” This means that we let the gate layers look at the cell state."

zhujiangang commented 8 years ago

I see. One of my teammates shared this variant just now (just 10 minutes ago). What a coincidence! Thank you! ps. I am your fan in Weibo.