Open z-zawhtet-a opened 8 years ago
This is very cool paper! I am wondering if any of the packages for DNN implement something like this
An alternative is to approximate this softmax with negative sampling (http://stackoverflow.com/questions/27860652/word2vec-negative-sampling-in-layman-term) or another software approximation like the ones used in tensorflow. In tensorflow, they have the negative sampling function tf.nn.sampled_softmax_loss: https://www.tensorflow.org/versions/master/api_docs/python/nn.html#candidate-sampling
Probably Keras has something like that too...
http://arxiv.org/pdf/1412.7091.pdf 200000 softmax outputs is just too many!