Closed YontiLevin closed 6 years ago
Yes, this process is ignored in the data.py. But you can correct by assigning int id to each word after sorting the words by their frequency in add_word function.
10x for the response. That's what i also understood. After implementing the adaptive softmax properly the model converged much faster for me.
in the 'An Analysis of Neural Language Modeling at Multiple Scales' paper it states that the hierarchy of the words is determined by their frequency. For some reason i can't find that in the code. not in the dictionary nor the corpus build. It seems like the words ids are determined by the order of their occurrence. please point me to where that takes place. many thanks