yoonkim / lstm-char-cnn

LSTM language model with CNN over characters
MIT License
826 stars 221 forks source link

About the hierarchical softmax #10

Closed quanpn90 closed 8 years ago

quanpn90 commented 8 years ago

Hi,

Thanks for the great model, and happy new year.

I would like to ask about your hierarchical softmax. Is it your intention to equally share the words to the cluster, or to make the implementation easier. I find it hard to understand the way you distribute the words to clusters, did you use a normal distribution ? I tried to group words based on their unigram frequencies (like in Mikolov's model) but the result is very bad.

Also, I guess you have also tried fbnn HSM. I tried to apply it on top of the network (after the final dropout), but it gives very huge loss. Is it possible to improve your HSM to make it work better with asynchronous clusters (some may have several words, while some have a lot of words).

Thank you,

yoonkim commented 8 years ago

It's mostly to make the implementation easier, and Ifound it to work surprisingly well.

I did also try fbnn but couldn't get it to work. I am not a 100% sure why, but I think there is an issue with precision: https://groups.google.com/forum/#!searchin/torch7/HSM/torch7/Hq_KL4k69dM/D3lf0r1OAQAJ