Closed quanpn90 closed 8 years ago
It's mostly to make the implementation easier, and Ifound it to work surprisingly well.
I did also try fbnn but couldn't get it to work. I am not a 100% sure why, but I think there is an issue with precision: https://groups.google.com/forum/#!searchin/torch7/HSM/torch7/Hq_KL4k69dM/D3lf0r1OAQAJ
Hi,
Thanks for the great model, and happy new year.
I would like to ask about your hierarchical softmax. Is it your intention to equally share the words to the cluster, or to make the implementation easier. I find it hard to understand the way you distribute the words to clusters, did you use a normal distribution ? I tried to group words based on their unigram frequencies (like in Mikolov's model) but the result is very bad.
Also, I guess you have also tried fbnn HSM. I tried to apply it on top of the network (after the final dropout), but it gives very huge loss. Is it possible to improve your HSM to make it work better with asynchronous clusters (some may have several words, while some have a lot of words).
Thank you,