hunkim / word-rnn-tensorflow

Multi-layer Recurrent Neural Networks (LSTM, RNN) for word-level language models in Python using TensorFlow.
MIT License
1.3k stars 494 forks source link

Vocab index ordering? #75

Open Tahlor opened 6 years ago

Tahlor commented 6 years ago

Why do we calculate word counts if we ultimately include all words and sort the vocab alphabetically? Mostly I'm wondering if you ordered it with the most common words first, and later reverted to alphabetical ordering for some reason. Would it be bad to have the most common words first? Or do we expect the model would learn better with a random ordering?

In utils.py:

        vocabulary_inv = [x[0] for x in word_counts.most_common()]
        vocabulary_inv = list(sorted(vocabulary_inv))